[Cython] DEF converts byte strings to unicode

Stefan Behnel stefan_ml at behnel.de
Sat Sep 12 19:23:26 CEST 2015


Stefan Behnel schrieb am 12.09.2015 um 18:14:
> Jakub Wilk schrieb am 12.09.2015 um 14:59:
>> I think something is still not quite right in Cython 0.23.2.
>>
>> Consider this code:
>>
>> DEF FOO = 'foo'
>> print type('foo')
>> print type(FOO)
>>
>> In Python 3, I get:
>>
>> <class 'str'>
>> <class 'bytes'>
> 
> Remember that DEF uses compile time evaluation in *Python*. Python does not
> have the three string types that Cython has, it has only two: either
> str/unicode (Py2) or bytes/str (Py3). If you pass an unprefixed string
> through compile time evaluation, it looses the information that it was
> unprefixed and turns into a specific Python string object type (i.e. bytes
> or unicode), which in this case is bytes, lacking any kind of encoding
> information.
> 
> Cython follows Py2 semantics by default, so having it turn into a bytes
> (i.e. Py2 str) object is actually not wrong. Certainly not more wrong than
> a unicode string would be. If you compile in Py3 mode, you should get a
> Unicode string.
> 
> My general recommendation is to a) avoid DEF, b) avoid DEF for string
> values, and c) avoid DEF for unprefixed string values, in that order. But
> b) and c) are only for advanced users.

That being said, always returning a bytes object is actually unhelpful in
Python 3. Let's see if anyone complains if we change that.

https://github.com/cython/cython/commit/ba350910b67db90c14e0aab79eafe7ac1be1d837

Stefan



More information about the cython-devel mailing list