[Cython] [cython] Initial startswith / endswith optimization (#35)

Stefan Behnel stefan_ml at behnel.de
Thu May 26 23:43:35 CEST 2011


John Ehresman, 26.05.2011 22:02:
> On 5/26/11 3:27 AM, Stefan Behnel wrote:
>>>> I think this means that the current unicode optimizations aren't used
>>>> when
>>>> variables are declared as str and a python 3 runtime is used. Should all
>>>> unicode optimizations support str eventually?
>>>
>>> Yes.
>>
>> Well, minus those that are not portable. For example, the return type of
>> indexing and iteration is the C type "Py_UCS4" for unicode, but the
>> Python type "str" (i.e. bytes/unicode) for "str". I also didn't take a
>> thorough look through the C-API functions for the str type in Py2 and
>> Py3. Things certainly become more ugly when trying to optimise Python
>> code into C for both platforms, than when leaving things at the Python
>> type level.
>
> Would it work for these methods to return Py_UCS4 in all 3 cases (unicode,
> bytes, str)?

There are two sides to this: what the C compiler eventually sees and what 
Cython makes of the types internally. Letting Cython assume that the result 
is Py_UCS4 is incorrect in the Py2 case. Amongst other problems, it would 
make the value turn into a unicode string when coercing to a Python object.

> In the bytes case, the multibyte int would simply be cast to char if
> that was what it was assigned to but the value wouldn't be above 255 in
> any case.

Sure it could, "str" is unicode in Py3, so you get a Unicode string with 
all possible values, e.g. when using unicode escapes.


> The case I worry about is losing optimizations w/ a Python3 runtime if str
> is used rather than unicode.

You should expect that. If you want optimised code, use a suitable type.

Stefan


More information about the cython-devel mailing list