Talk GC3Pie: a Python framework for high-throughput computing

Abstract

GC3Pie is a suite of Python classes (and command-line tools built upon them) to aid in submitting and controlling batch jobs to clusters and grid resources seamlessly. GC3Pie aims at providing the building blocks by which Python scripts that combine several existing applications in a dynamic workflow can be quickly developed.

The analysis of very large datasets with various interdependent applications is becoming a need for more and more scientific communities. The single-job level of control exposed by many existing computing infrastructures (being them grid or clouds) in this case is often not enough: users have to implement "glue scripts" to control hundreds or thousand jobs at once.

GC3Pie abstracts the generic code out of this picture, in the form of re-usable Python classes that implement a single point of control for application collections. GC3Pie provides a simple model for expressing the way an application should behave (how it should be invoked, what input data it requires, etc.); support is already provided for some popular scientific applications, and more can be added by subclassing the generic Application object.

GC3Pie allows creating application collections (for "parameter sweep"-like execution models) as well as expressing the interdependency between different applications within the same execution logic (i.e., workflows). The same control logic is applied to a single Application instance, a large collection of concurrently executing jobs, or an entire workflow.

This talk will survey the features and services offered by GC3Pie, and show how these have been used in a number of real-life scientific use cases.

Authors:Sergio Maffioletti <sergio.maffioletti@gc3.uzh.ch>, Riccardo Murri <riccardo.murri@gmail.com>, Mike Packard <mpackard@oci.uzh.ch>
tagged by
no related entity