The General Interpreter Lock (GIL) is a feature of CPython that significantly simplifies the implementation of Python because only one thread is allowed to run at a given time. The down side is that it is not possible to take advantage of multiple CPUs from within one Python process. Of course, there are many message passing libraries for Python such as standard library module multiprocessing, several MPI-based solutions and many pure Python solutions.
While message passing is a good solution for certain type of problems, it generates an overhead of serializing and de-serializing of data. Depending on the algorithm, this overhead might be considerable in relation to the gained speed-up. OpenMP (www.openmp.org) is a widely used standard for shared-memory programming on multicore machines.
Although OpenMP is only specified for C/C++ and Fortran, we can utilize it from Python. As there is a variety of tools to connect Python with C, there are many ways to take advantage of OpenMP from Python. One of the most comfortable for this task is Cython. It not only provides a programming experience very close to Python with only a few C-like additions, but also allows to easily adjust lower level aspects that are often important for parallel performance. Furthermore, it seamlessly integrates with NumPy and the small overhead compared with other tools for writing C extensions make Cython an attractive choice.
This talk gives a brief overview of parallel programming options in Python. It introduces Cython's OpenMP abilities focussing on parallel loops over NumPy arrays. Source code examples demonstrate how to use OpenMP from Python. Results for parallel algorithms with OpenMP show what speed-ups can be achieved for different data sizes compared to other parallelizing strategies.