I ran a benchmark with 5 cases:
- PY_NEW macro (still has python overhead for each call to the creator function)
- regular python init
- python init using __slots__
- cython init (cdef'ed class)
- batch PY_NEW: calling PY_NEW from inside cython to avoid python call overhead
- batch init on cython class
the timings look like this:
PY_NEW on Cython class: 1.160
__init__ on Python class: 30.414
__init__ on Python class with slots: 10.242
__init__ on Cython class 1.185
batch PY_NEW total: 0.855 , interval only: 0.383
batch __init__ on Cython class total 0.998 , interval_only: 0.540
So, the PY_NEW is .383 compared to .540 for using a __init__ on a Cython class, but both are much faster than python. I was surprised that using slots gives a 3x speed improvement over a regular python class. That Cython is faster is no surprise.
Stefan Behnel explains better than I could.
All the code is smashed uncomfortably into this gist.
for kicks, i tried with unladen-swallow. It comes out almost 2x faster on the python times both with and without slots. I didn't use the optimization stuff. Cython even works with unladen-swallow--just have to rebuild the .so--and the timings are the same as Cython-with-CPython.