Talk NumPy + Cython

Presented by Mike Müller (Python Academy) in Advanced tutorial track 2012 on 2012/08/23 from 09:15 to 10:30

NumPy + Cython


This tutorial was prepared by Mike Müller and Stefan Behnel.

Target Audience

This tutorial targets medium level to experienced Python programmers who want to break through the limits of Python performance. A basic understanding of the C language is helpful but not required. Basic understanding of Cython and NumPy is necessary.


NumPy and SciPy come with a broad set of high-level functionality that allows to express complex computational algorithms concisely and efficiently. However, in many cases, sequential operations on NumPy arrays introduce a considerable overhead. This can happen when arrays are unnecessarily being copied during an operation that does not work in-place, but also due to lacking CPU cache locality when large arrays are being traversed multiple times in a row. In both cases, Cython can provide a substantial speed-up by expressing algorithms more efficiently.

The main features that make Cython so attractive for NumPy users are its ability to access and process the arrays directly at the C level, and the native support for parallel loops based on the OpenMP compiler infrastructure. To work efficiently with arrays and other memory buffers, Cython has native syntax support for the Python buffer protocol, which allows C extensions (like NumPy or image processing libraries) to grant foreign code direct access to their internal data buffers.


Use of Python's buffer interface from Cython code

  • directly accessing data buffers of other Python extensions
  • retrieving meta data about the buffer layout
  • setting up efficient memory views on external buffers

Implementing fast Cython loops over NumPy arrays

  • looping over NumPy exported buffers
  • implementing a simple image processing algorithm
  • using "fused types" (simple templating) to implement an algorithm once and run it efficiently on different C data types

Use of parallel loops to make use of multiple processing cores

  • building modules with OpenMP
  • processing data in parallel
  • speeding up an existing loop using OpenMP threads

Software requirements

Note: the part on parallel processing requires a C compiler that supports OpenMP, e.g. gcc starting with 4.2, preferably 4.4 or later. It should be readily available in recent installations of both Linux and MacOS-X. Note that recent versions of XCode use the "clang" compiler, which does not support OpenMP. On these systems, please install gcc separately and make sure it can be used from your CPython installation. Users of Microsoft Windows must install the C compiler that was used to build their Python installation, e.g. the VS2008 Express or MinGW for Python 2.7.

tagged by
no related entity