Talk Playdoh: a lightweight Python library for distributed computing and optimisation

Presented by Cyrille Rossant in Scientific track 2010 on 2010/07/11 from 14:00 to 14:30 in room Dussane
Abstract

authors: Cyrille Rossant, Bertrand Fontaine, Dan Goodman

Scientists increasingly require high performance computing. As the clock frequency of processors is limited by physical constraints, parallel computing is becoming an essential paradigm in scientific research. This is particularly the case for optimisation tasks since many global optimisation algorithms are mostly embarrassingly parallel. Supercomputers or large-scale computer clusters offer an efficient but expensive way of distributing computations over hundreds of computers. General-purpose computing on graphics processing units (GPUs) is a cheap and interesting alternative to supercomputers, although it can require scientists to invest in adapting their computations to the particular architecture of these devices.

Most labs have many interconnected computers already installed which often contain multiple cores (CPUs). Many of these computers also have GPUs, or could be inexpensively fitted with them. Playdoh addresses precisely this situation: it is a lightweight standalone Python library for turning lab computers into a small cluster at no cost. It allows distributing computational tasks, in particular optimisations, over a set of interconnected machines. It has built-in support for multiple CPUs, GPUs, and works on both UNIX and Windows systems. It is primarily intended to Scipy users who want a simple and transparent way for distributing their computations and optimisations.

The machines to be included in the cluster only need to have Python and Playdoh installed. Each machine then continuously listens for incoming task requests from other computers in the cluster, while it still can be used by the machine’s user in parallel. Tasks can be submitted to the cluster from any of the cluster’s machines.

Playdoh offers two interfaces for distributing computations in the cluster. The asynchronous interface implements the single instruction, multiple data (SIMD) paradigm: the submitter specifies a Python function and a list of arguments to evaluate the function on. These jobs are split among the computers which process them using some or all of their CPUs/GPUs. The machines assign unique identifiers to the jobs and instantaneously return them to the submitter. These identifiers allow the latter to asynchronously poll for the jobs’ status and retrieve the results. On any machine, the number of CPUs/GPUs dedicated to the cluster can be changed at any time by the machine’s user.

The synchronous interface is used when the tasks cannot be distributed independently over the cluster’s machines but need to synchronise regularly. It is most notably the case for optimisations. These tasks consist of iterative instruction flows processed in parallel on the machines and containing synchronisation points. Automatic inter-machine communication is allowed at these points according to a user-specified oriented graph. Playdoh currently has built-in implementations of particle swarm optimisation algorithms, genetic algorithms and evolutionary algorithms. It allows the user to easily optimise in parallel any specified fitness function.

Playdoh is designed to be as transparent as possible to the user: it handles low-level implementation details such as network communication, multithreading, multiprocessing, GPU support with PyCUDA, automatic Python code transport at runtime, and distant exception handling. It is already used by a spiking neuron model fitting library written in Python and using the “Brian” neural network simulator. We think Playdoh will interest anyone who wants a straightforward solution for leveraging the power of several interconnected computers for distributing computations and optimisations.