"More About Unicode in Python 2 and 3"

Ethan Furman ethan at stoneleaf.us
Sun Jan 5 21:23:57 EST 2014


On 01/05/2014 05:48 PM, Chris Angelico wrote:
> On Mon, Jan 6, 2014 at 12:16 PM, Ned Batchelder <ned at nedbatchelder.com> wrote:
>> So now we have two revered developers vocally having trouble with Python 3.
>> You can dismiss their concerns as niche because it's only network
>> programming, but that would be a mistake.
>
> IMO, network programming (at least on the internet) is even more Py3's
> domain (pun not intended).

The issue is not how to handle text, the issue is how to handle ascii when it's in a bytes object.

Using my own project [1] as a reference:  good ol' dbf files -- character fields, numeric fields, logic fields, time 
fields, and of course the metadata that describes these fields and the dbf as a whole.  The character fields I turn into 
unicode, no sweat.  The metadata fields are simple ascii, and in Py2 something like `if header[FIELD_TYPE] == 'C'` did 
the job just fine.  In Py3 that compares an int (67) to the unicode letter 'C' and returns False.  For me this is simply 
a major annoyance, but I only have a handful of places where I have to deal with this.  Dealing with protocols where 
bytes is the norm and embedded ascii is prevalent -- well, I can easily imagine the nightmare.

The most unfortunate aspect is that even if we did "fix" it in 3.5, it wouldn't help any body who has to support 
multiple versions... unless, of course, a backport could also be made.

--
~Ethan~



More information about the Python-list mailing list