[Cython] DEF converts byte strings to unicode

Stefan Behnel stefan_ml at behnel.de
Sat Sep 12 18:14:41 CEST 2015


Jakub Wilk schrieb am 12.09.2015 um 14:59:
> I think something is still not quite right in Cython 0.23.2.
> 
> Consider this code:
> 
> DEF FOO = 'foo'
> print type('foo')
> print type(FOO)
> 
> In Python 3, I get:
> 
> <class 'str'>
> <class 'bytes'>

Remember that DEF uses compile time evaluation in *Python*. Python does not
have the three string types that Cython has, it has only two: either
str/unicode (Py2) or bytes/str (Py3). If you pass an unprefixed string
through compile time evaluation, it looses the information that it was
unprefixed and turns into a specific Python string object type (i.e. bytes
or unicode), which in this case is bytes, lacking any kind of encoding
information.

Cython follows Py2 semantics by default, so having it turn into a bytes
(i.e. Py2 str) object is actually not wrong. Certainly not more wrong than
a unicode string would be. If you compile in Py3 mode, you should get a
Unicode string.

My general recommendation is to a) avoid DEF, b) avoid DEF for string
values, and c) avoid DEF for unprefixed string values, in that order. But
b) and c) are only for advanced users.

Stefan



More information about the cython-devel mailing list