"More About Unicode in Python 2 and 3"

Sun Jan 5 21:55:34 EST 2014

On Mon, Jan 6, 2014 at 1:23 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> The metadata fields are simple ascii, and in Py2 something like `if
> header[FIELD_TYPE] == 'C'` did the job just fine.  In Py3 that compares an
> int (67) to the unicode letter 'C' and returns False.  For me this is simply
> a major annoyance, but I only have a handful of places where I have to deal
> with this.  Dealing with protocols where bytes is the norm and embedded
> ascii is prevalent -- well, I can easily imagine the nightmare.

It can't be both things. It's either bytes or it's text. If it's text,
then decoding it as ascii will give you a Unicode string; if it's
small unsigned integers that just happen to correspond to ASCII
values, then I would say the right thing to do is integer constants -
or, in Python 3.4, an integer enumeration:

>>> socket.AF_INET
<AddressFamily.AF_INET: 2>
>>> socket.AF_INET == 2
True

I'm not sure what FIELD_TYPE of 'C' means, but my guess is that it's a
CHAR field. I'd just have that as the name, something like:

CHAR = b'C'[0]

if header[FIELD_TYPE] == CHAR:
    # handle char field

If nothing else, this would reduce the number of places where you
actually have to handle this. Plus, the code above will work on many
versions of Python (I'm not sure how far back the b'' prefix is
allowed - probably 2.6).

ChrisA