Speed up Python with Pyrex

The often heard compliant about Python, mostly from C/C++ programmers is that, python code execution is slow. Python is fast enough for most I/O bound,DB and GUI applications. But, when it comes to pure number crunching, python is many orders slower than a corresponding C application. The solution is to write the performance intensive parts in C as an extension.

However, writing C extensions to Python is a non trivial task. Even before you start writing your first extension function, there is certain amount of wrapper code you need to write. Then there is data type conversion between Python and C. Basic types such as int and strings are easily converted, but user defined types are much more tricky. As with any C program you are saddled with memory management chores. So, you might end up chasing nasty bugs while you should be more concerned about writing good code that does the work.

Tools like Simplified Wrapper and Interface Generator(SWIG), take away the burden of writing extension code to certain extent. SWIG takes a definition file consisting of a mixure of C code and specialised declarations, and produces an extension module. But, SWIG is not very helpful when you want to create new python types.

Other projects like PyInline take a different approach, by allowing the C code to be embedded into the Python code. PyInline then extracts the C code from python and compiles them into extensions. But the problem with types still remains.

Pyrex

Pyrex provides an elegant solution to these problems. Pyrex is a python-like language specifically designed for writing python extension modules.The syntax is almost python like, ie., most of python code is valid pyrex and vice versa. In short, Pyrex is Python with C data types.

The following is pyrex code which computes the first 'n' prime numbers.

    primes.pyx

 1  def primes(int kmax):
 2      cdef int n, k, i
 3      cdef int p[1000]
 4      result = []
 5      if kmax > 1000:
 6          kmax = 1000
 7      k = 0
 8      n = 2
 9      while k <= kmax:
10          i = 0
11          while i <= k and n % p[i] <> 0:
12              i = i + 1
13          if i == k:
14             p[k] = n
15             k = k + 1
16             result.append(n)
17          n = n + 1
18      return result

This code reads as easily as any python code. The only difference being the type declaration of the variables. On running pyrexc (the pyrex compiler) on this code, a C file is generated (primes.c).

bash$ pyrexc primes.pyx

This file can be easily compiled into a C extension. For example with gcc,

bash$ gcc -c -fPIC -I/usr/include/python2.3/ primes.c

This results in primes.o file which has to be linked to producea a extension module.

bash$ gcc -shared primes.o -lxosd -o primes.so

We can try out this newly created module in the python interpreter as follows:

>>> import primes
>>> primes.primes(12)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]

Lets compare how this C extension compares with a pure python function, in terms of execution speed. In the above example, removing cdef and int keywords will give a working python module(pyprimes.py).


      1 #  compare.py
      2 #  Calculate prime numbers
      3 #
      4 import time
      5 import pyprimes #pure python implementation
      6 import primes   #C extension - created using pyrex
      7
      8 if __name__ == "__main__":
      9         n = 1000
     10         print "Time taken for %d prime numbers : " % n
     11         start = time.time()
     12         x =  pyprimes.primes(n)
     13         end = time.time()
     14         print "PURE PYTHON : %f"% (end-start)
     15
     16         start = time.time()
     17         x =  primes.primes(n)
     18         end = time.time()
     19         print "PYREX:  %f" % (end-start)

Output

bash$ python compare.py
Time taken for 1000 prime numbers :
PURE PYTHON : 0.317926
PYREX:  0.027067

Whoa! thats nearly 12 times increase in speed. In contrast to writing the same module in C, the pyrex code is almost as long as the native python code and it is as readable too.

How does pyrex achieve this? If you recall, everything in python is an object. Even a basic type like int is an object in python. So, when ever you use an int, it has to box and unbox to get at the actual data. This adds overheads to each computation involving even the simplest type. This is the price we pay for automatic memory handling and intelligent interaction with other types. On the other hand, a C or pyrex int is a location in the physical memory. An operation on c/pyrex int does not involve redirection.

Creating New Types

In Pyrex, code which manipulates Python values and C values can be freely intermixed, with conversions being handled automatically whenever possible. Reference count maintenance and error checking of Python operations is automatic, and the full power of Python's exception handling facilities is available even when dealing with C-data. Pyrex also lets you write code to convert between user defined Python data structures and C data structures, in almost transparent manner. The power of this is evident when compared to traditional methods of writing C extensions, where a good knowledge of Python/C API is required. Lets look at an example.

1 cdef class Account:
2         cdef float balance
3         cdef char *name
4         def __new__(self,name):
5                 self.name = name
6                 balance = 0
7         def incrAmt(self,amt):
8                 self.balance = self.balance + amt
9         def getBalance(self):
10                 return self.balance

With this we have created an almost python looking Type, which is accessible to both C and Python calls. Line 1 defines a new type - Account. The C variables are declared immediately after the class declaration. You cannot declare C variables inside the constructor,__new__. The rest of the code is just Python.
Note: __new__ is called before the object is created.

>>> from Account import Account
>>> myac = Account('Pradeep')
>>> myac.incrAmt(1000)
>>> myac.getBalance()
1000.0

The Python usage of this new Type is transparent, as you can see from the code above. The Python programmer using this Type is completely unaware of its C underpinnings!

Differences

The integer for loop in python usually makes use of the range() function. range is a Python function and hence slower. Pyrex provides another form of for-loop:

for i from 0 <= i < n:
	doSome()
	...

Pyrex does not support all the functionality of Python. Some of the gotchas! to remember would be:

import * is not allowed. However, other forms are allowed
Generators cannot be defined in pyrex
In-place arithmetic operators (+=, etc) are not yet supported.
List comprehensions are not yet supported.
Functions cannot be defined inside other function definitions

Pyrex and the Programmer

Whenever we talk of speed, we usually refer to code execution. However, as any seasoned pythonista would vouch, the increase in the programmer's productivity is what makes Python such an attractive language. With Python, the programmer can cut down the coding time by orders of magnitude. Pyrex allows the Python programmer to keep his lazy and Pythonic way of coding and yet achieve execution efficiency. Just imagine all the time you would be saving by not having to hunt down memory leaks in C.

Where do we use Pyrex?

If you have blocks of code that deal with numerical computations in tight loops, then you should consider moving that code into a pyrex module. If done correctly, those performance intensive parts can give you a boost of anywhere between 10-50 times the python speed. Code that deals mainly with I/O bound operations and library calls is not going to benefit much from Pyrex.

Resources

Pyrex Site: http://nz.cosc.canterbury.ac.nz/~greg/python/Pyrex/
Pyrex Guide: http://ldots.org/pyrex-guide
David Mert'z article: http://www-128.ibm.com/developerworks/library/l-cppyrex.html
Simplified Wrapper and Interface Generator(SWIG): http://swig.org