Let’s speak from my heart about cython
Sorry, this article will be done later
And science we are developers, we will use code right here.
I don’t want to do boring things later. So I include numpy, cython compiler in IPython and small package for profiling parts of code.
%load_ext autotime
import numpy as np
time: 71.1 ms
%load_ext cython
%load_ext autotime
The autotime extension is already loaded. To reload it, use:
%reload_ext autotime
time: 324 ms
sections = 1000000
time: 372 µs
I would like to do this topic as short and informative as possible. And the format which i choose - to show 5 most obviously reasons for using cython in your projects right now.
5 reasons to use cython instead python
1: for using cython you could stay code as it exist
We define to python functions: some f(x) and integrator. Each reason will modificate this code. Every next reason for using cython will change your code more, but also it would have better effect.
def f(x):
return x**2-x
def integrate_f(a, b, N):
s = 0
dx = (b-a)/N
for i in range(N):
s += f(a + dx * i)
return s * dx
time: 1.2 ms
Important note: initialization of python function (at least in jupyter notebook) faster then cython function. But it’s only initialization!
integrate_f(0, 10, sections)
283.3328833334909
time: 346 ms
That’s time for python function to calculate integral of f(x) on [0, 10] with 10 million sections.
%%cython
def f_cython(x):
return x**2-x
def integrate_f_cython(a, b, N):
s = 0
dx = (b-a)/N
for i in range(N):
s += f_cython(a + dx * i)
return s * dx
time: 4.15 ms
integrate_f_cython(0, 10, sections)
283.3328833334909
time: 248 ms
Total:
So in my notebook I have speed ~ x1.33 without any changes in my code. Probably, it’s my favorite thing in Cython. Nobody require some actions from you: just replace python on cython. Just use %%cython in your jupyter or add setup file to compile your code in object module.
2: you could use fast types conversion
What if i want to do requirement for my function: numbers in interval have to be integer.
integrate_f(0, 10.1, sections)
292.42820252133953
time: 335 ms
It’s a bad behaviour for my requirement. I have to change my integrate_f function:
def integrate_f(a, b, N):
a, b = int(a), int(b)
s = 0
dx = (b-a)/N
for i in range(N):
s += f(a + dx * i)
return s * dx
time: 1.04 ms
integrate_f(0, 10.1, sections)
283.3328833334909
time: 357 ms
Works pretty, but how to do this in cython?
%%cython
def f_cython(x):
return x**2-x
def integrate_f_cython(int a, int b, N):
s = 0
dx = (b-a)/N
for i in range(N):
s += f_cython(a + dx * i)
return s * dx
time: 2.15 ms
integrate_f_cython(0, 10.1, sections)
283.3328833334909
time: 240 ms
Total:
3: you could make your code faster very well
Let’s add some C stuff in Cython:
In Cython we have three types of function definidtion:
def
cdef
cpdef
def
method of definition in cython is the same as the python method. Your performance would be greater only because your code will be compiled in object files.
cdef
method mean that the code in this function will be transform in pure C code. Python syntax is no more than sugar in the code.
Last not least cpdef
is the method which as def
method use arguments and return value as python object but inside them it called cdef
method.
For better understanding I show you a very good part of cython compiler: annotations. It allows you to see how compiler are transformed your code.
%%cython -a
def f_annotations(x):
for i in range(10):
pass
return x**2-x
<!DOCTYPE html>
Generated by Cython 0.29.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
1:
+2: def f_annotations(x):
/* Python wrapper */ static PyObject *__pyx_pw_46_cython_magic_34d4deaa618b533ef1b528f01d06ce41_1f_annotations(PyObject *__pyx_self, PyObject *__pyx_v_x); /*proto*/ static PyMethodDef __pyx_mdef_46_cython_magic_34d4deaa618b533ef1b528f01d06ce41_1f_annotations = {"f_annotations", (PyCFunction)__pyx_pw_46_cython_magic_34d4deaa618b533ef1b528f01d06ce41_1f_annotations, METH_O, 0}; static PyObject *__pyx_pw_46_cython_magic_34d4deaa618b533ef1b528f01d06ce41_1f_annotations(PyObject *__pyx_self, PyObject *__pyx_v_x) { PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("f_annotations (wrapper)", 0); __pyx_r = __pyx_pf_46_cython_magic_34d4deaa618b533ef1b528f01d06ce41_f_annotations(__pyx_self, ((PyObject *)__pyx_v_x)); /* function exit code */ __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_46_cython_magic_34d4deaa618b533ef1b528f01d06ce41_f_annotations(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_x) { CYTHON_UNUSED long __pyx_v_i; PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("f_annotations", 0); /* … */ /* function exit code */ __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_2); __Pyx_XDECREF(__pyx_t_3); __Pyx_AddTraceback("_cython_magic_34d4deaa618b533ef1b528f01d06ce41.f_annotations", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; } /* … */ __pyx_tuple_ = PyTuple_Pack(2, __pyx_n_s_x, __pyx_n_s_i); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_tuple_); __Pyx_GIVEREF(__pyx_tuple_); /* … */ __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_34d4deaa618b533ef1b528f01d06ce41_1f_annotations, NULL, __pyx_n_s_cython_magic_34d4deaa618b533ef1); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); if (PyDict_SetItem(__pyx_d, __pyx_n_s_f_annotations, __pyx_t_1) < 0) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
+3: for i in range(10):
for (__pyx_t_1 = 0; __pyx_t_1 < 10; __pyx_t_1+=1) { __pyx_v_i = __pyx_t_1; }
4: pass
+5: return x**2-x
__Pyx_XDECREF(__pyx_r); __pyx_t_2 = PyNumber_Power(__pyx_v_x, __pyx_int_2, Py_None); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 5, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_2); __pyx_t_3 = PyNumber_Subtract(__pyx_t_2, __pyx_v_x); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 5, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_3); __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; __pyx_r = __pyx_t_3; __pyx_t_3 = 0; goto __pyx_L0;
time: 13.5 ms
%%cython -a
cdef float f_annotations(float x):
cdef int i = 0
for i in range(10):
pass
return x**2-x
<!DOCTYPE html>
Generated by Cython 0.29.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
1:
+2: cdef float f_annotations(float x):
static float __pyx_f_46_cython_magic_8e3a790a2411635bceeaa5ef932d12a8_f_annotations(float __pyx_v_x) { CYTHON_UNUSED int __pyx_v_i; float __pyx_r; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("f_annotations", 0); /* … */ /* function exit code */ __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; }
+3: cdef int i = 0
__pyx_v_i = 0;
+4: for i in range(10):
for (__pyx_t_1 = 0; __pyx_t_1 < 10; __pyx_t_1+=1) { __pyx_v_i = __pyx_t_1; }
5: pass
+6: return x**2-x
__pyx_r = (powf(__pyx_v_x, 2.0) - __pyx_v_x); goto __pyx_L0;
time: 4.86 ms
%%cython -a
cpdef float f_annotations(float x):
cdef int i = 0
for i in range(10):
pass
return x**2-x
<!DOCTYPE html>
Generated by Cython 0.29.2
Yellow lines hint at Python interaction.
Click on a line that starts with a "+
" to see the C code that Cython generated for it.
1:
+2: cpdef float f_annotations(float x):
static PyObject *__pyx_pw_46_cython_magic_896614c59ea843d156203311226a1c7f_1f_annotations(PyObject *__pyx_self, PyObject *__pyx_arg_x); /*proto*/ static float __pyx_f_46_cython_magic_896614c59ea843d156203311226a1c7f_f_annotations(float __pyx_v_x, CYTHON_UNUSED int __pyx_skip_dispatch) { CYTHON_UNUSED int __pyx_v_i; float __pyx_r; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("f_annotations", 0); /* … */ /* function exit code */ __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; } /* Python wrapper */ static PyObject *__pyx_pw_46_cython_magic_896614c59ea843d156203311226a1c7f_1f_annotations(PyObject *__pyx_self, PyObject *__pyx_arg_x); /*proto*/ static PyObject *__pyx_pw_46_cython_magic_896614c59ea843d156203311226a1c7f_1f_annotations(PyObject *__pyx_self, PyObject *__pyx_arg_x) { float __pyx_v_x; PyObject *__pyx_r = 0; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("f_annotations (wrapper)", 0); assert(__pyx_arg_x); { __pyx_v_x = __pyx_PyFloat_AsFloat(__pyx_arg_x); if (unlikely((__pyx_v_x == (float)-1) && PyErr_Occurred())) __PYX_ERR(0, 2, __pyx_L3_error) } goto __pyx_L4_argument_unpacking_done; __pyx_L3_error:; __Pyx_AddTraceback("_cython_magic_896614c59ea843d156203311226a1c7f.f_annotations", __pyx_clineno, __pyx_lineno, __pyx_filename); __Pyx_RefNannyFinishContext(); return NULL; __pyx_L4_argument_unpacking_done:; __pyx_r = __pyx_pf_46_cython_magic_896614c59ea843d156203311226a1c7f_f_annotations(__pyx_self, ((float)__pyx_v_x)); /* function exit code */ __Pyx_RefNannyFinishContext(); return __pyx_r; } static PyObject *__pyx_pf_46_cython_magic_896614c59ea843d156203311226a1c7f_f_annotations(CYTHON_UNUSED PyObject *__pyx_self, float __pyx_v_x) { PyObject *__pyx_r = NULL; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("f_annotations", 0); __Pyx_XDECREF(__pyx_r); __pyx_t_1 = PyFloat_FromDouble(__pyx_f_46_cython_magic_896614c59ea843d156203311226a1c7f_f_annotations(__pyx_v_x, 0)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2, __pyx_L1_error) __Pyx_GOTREF(__pyx_t_1); __pyx_r = __pyx_t_1; __pyx_t_1 = 0; goto __pyx_L0; /* function exit code */ __pyx_L1_error:; __Pyx_XDECREF(__pyx_t_1); __Pyx_AddTraceback("_cython_magic_896614c59ea843d156203311226a1c7f.f_annotations", __pyx_clineno, __pyx_lineno, __pyx_filename); __pyx_r = NULL; __pyx_L0:; __Pyx_XGIVEREF(__pyx_r); __Pyx_RefNannyFinishContext(); return __pyx_r; }
+3: cdef int i = 0
__pyx_v_i = 0;
+4: for i in range(10):
for (__pyx_t_1 = 0; __pyx_t_1 < 10; __pyx_t_1+=1) { __pyx_v_i = __pyx_t_1; }
5: pass
+6: return x**2-x
__pyx_r = (powf(__pyx_v_x, 2.0) - __pyx_v_x); goto __pyx_L0;
time: 9.81 ms
As you can see cdef
method generate a shorter code. But practically you more often will use cpdef
method for binding your functions with other python code.
%%cython
cdef float f_cython(float x):
return x**2-x
cpdef float integrate_f_cython(int a, int b, int N):
cdef float s = 0
cdef float dx = (b-a)/N
cdef int i
for i in range(N):
s += f_cython(a + dx * i)
return s * dx
time: 6.8 ms
integrate_f(0, 10.1, sections)
283.3328833334909
time: 342 ms
integrate_f_cython(0, 10.1, sections)
283.33111572265625
time: 7.31 ms
Total:
4: you could use functions from C and C++ which haven’t bindings to python
For me it was CUDA functions in OpenCV.. But I really don’t want to write about it and I can’t share that code. So I just paste here a link about that.
5: Easy way to parallelize your code
%%cython
cpdef print_parallel(int N):
cdef int i = 0
for i in range(N):
print(i, end='')
print()
time: 5.28 ms
print_parallel(10)
0123456789
time: 1.98 ms
%%cython --compile-args=-fopenmp --link-args=-fopenmp --force
# import cython.parallel as cp
from cython.parallel import prange
cpdef print_parallel_cython(int N):
cdef int i = 0
for i in prange(N, nogil=True):
with gil:
print(i, end='')
print()
time: 388 ms
print_parallel_cython(10)
8670912345
time: 21.1 ms