[Cython] About IndexNode and unicode[index]
Stefan Behnel
stefan_ml at behnel.de
Thu Feb 28 20:27:08 CET 2013
ZS, 28.02.2013 19:31:
> 2013/2/28 ZS:
>> Looking into IndexNode class in ExprNode.py I have seen a possibility
>> for addition of more fast code path for unicode[index] as it done in
>> method `generate_setitem_code` in case of lists.
>>
>> This is files for evaluation of performance difference:
>>
>> #### unicode_index.h
>>
>> /* This is striped version of __Pyx_GetItemInt_Unicode_Fast */
>> #include "unicodeobject.h"
>>
>> static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i);
>>
>> static inline Py_UCS4 unicode_char(PyObject* ustring, Py_ssize_t i) {
>> #if CYTHON_PEP393_ENABLED
>> if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1;
>> #endif
>> return __Pyx_PyUnicode_READ_CHAR(ustring, i);
>> }
Sure, looks ok.
>> ##### unicode_index.pyx
>>
>> # coding: utf-8
>>
>> cdef extern from 'unicode_index.h':
>> inline Py_UCS4 unicode_char(unicode ustring, int i)
>>
>> cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz"
>>
>> def f_1(unicode text):
>> cdef int i, j
>> cdef int n = len(text)
>> cdef Py_UCS4 ch
>>
>> for j from 0<=j<=1000000:
Personally, I find a range() loop much easier to read than this beast.
>> for i from 0<=i<=n-1:
>> ch = text[i]
>>
>> def f_2(unicode text):
>> cdef int i, j
>> cdef int n = len(text)
>> cdef Py_UCS4 ch
>>
>> for j from 0<=j<=1000000:
>> for i from 0<=i<=n-1:
>> ch = unicode_char(text, i)
>>
>> def test_1():
>> f_1(text)
>>
>> def test_2():
>> f_2(text)
>>
>> Timing results:
>>
>> (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from
>> mytests.unicode_index import test_1" "test_1()"
>> 100 loops, best of 10: 89 msec per loop
>> (py33) zbook:mytests $ python3.3 -m timeit -n 100 -r 10 -s "from
>> mytests.unicode_index import test_2" "test_2()"
>> 100 loops, best of 10: 46.1 msec per loop
I seriously doubt that this translates to similar results in real-world
code. In the second example above, the C compiler should be able to remove
a lot of code, certainly including the useless character read. Maybe even
the loops, if it can determine that PyUnicode_READY() will always return
the same result. So you're almost certainly not benchmarking what you think
you are.
>> in setup.py globally:
>>
>> "boundscheck": False
>> "wraparound": False
>> "nonecheck": False
>>
> For the sake of clarity I would like to add the following... This
> optimization is for the case when both `boundscheck(False)` and
> `wraparound(False)` is applied. Otherwise default path of evaluation
> (__Pyx_GetItemInt_Unicode) is applied.
>
> This allows to write unicode text parsing code almost at C speed
> mostly in python (+ .pxd defintions).
I suggest simply adding a constant flag argument to the existing function
that states if checking should be done or not. Inlining will let the C
compiler drop the corresponding code, which may or may nor make it a little
faster.
Stefan
More information about the cython-devel
mailing list