Talk A comparison of different approaches used to seek maintainability, performance and scalability in Python scientific code

Presented by Marko Loparic in Scientific track 2010 on 2010/07/10 from 15:15 to 15:30 in room Dussane

In this paper we compare three approaches that can be used for writing optimised Python code. We focus our comparison on three parameters: maintainability, performance and scalability.

The study derives from the experience of coding a Python program for an energy company (GDF Suez). The program consists of a post-processing of Monte Carlo simulations used for risk analysis. After a first implementation, the program was reimplemented using a quite different approach. The question of which of these two approaches is the most adaquate for future developments comes up. The answer does not seem obvious to us and this is what motivates this study.

In order to perform a meaningful comparison we select -- from the numerous formulas representing the financial parameters appearing in the program -- one that reflects reasonably well the characteristics of the whole code. Next we compare the two implementations made for the computation of the formula. We obtain a code of less than 100 lines for each case. We extend the comparison by adding a third approach, and thus writing a third implementation for the formula.

We compare then the three following approaches:

  • The pure NumPy approach. Here we code in pure Python and make extensive use of the NumPy library to get performance.
  • The two-phase approach. In a first phase a pure Python code runs on a small amount of data in a way that the business rules (consisting mainly of dictionary lookups) are completely walked through. Instead of performing the mathematical operations imposed by the code, these operations are recorded using the overloading of methods corresponding to basic mathematical operations and getitem, setitem methods. In a second phase the recorded operations are applied to the whole set of data by an simple, optimised Cython code, where in particular no Python library is used.
  • The Cython stardard approach. The code is first written in pure Python using the clearest and most readable, maintanable style. Next we switch from Python to Cython and progressively add Cython type declarations obtaining a highly optimised code. This is the technique that we can learn from Cython tutorials.

After presenting the three code segments, we develop a comparison regarding the parameters we have selected. Our main conclusion is that the two-phase approach is the one that we favour regarding the three parameters chosen and that its main drawback is that it may not be obvious or even possible to apply it for every need.