"More About Unicode in Python 2 and 3"

Ethan Furman ethan at stoneleaf.us
Mon Jan 6 10:10:56 EST 2014


On 01/05/2014 06:37 PM, Dan Stromberg wrote:
>
> The argument seems to be "3.x doesn't work the way I'm accustomed to,
> so I'm not going to use it, and I'm going to shout about it until
> others agree with me."

The argument is that a very important, if small, subset a data manipulation become very painful in Py3.  Not impossible, 
and not difficult, but painful because the mental model and the contortions needed to get things to work don't sync up 
anymore.  Painful because Python is, at heart, a simple and elegant language, but with the use-case of embedded ascii in 
binary data that elegance went right out the window.

On 01/05/2014 06:55 PM, Chris Angelico wrote:
>
> It can't be both things. It's either bytes or it's text.

Of course it can be:

0000000: 0372 0106 0000 0000 6100 1d00 0000 0000  .r......a.......
0000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000020: 4e41 4d45 0000 0000 0000 0043 0100 0000  NAME.......C....
0000030: 1900 0000 0000 0000 0000 0000 0000 0000  ................
0000040: 4147 4500 0000 0000 0000 004e 1a00 0000  AGE........N....
0000050: 0300 0000 0000 0000 0000 0000 0000 0000  ................
0000060: 0d1a 0a                                  ...

And there we are, mixed bytes and ascii data.  As I said earlier, my example is minimal, but still very frustrating in 
that normal operations no longer work.  Incidentally, if you were thinking that NAME and AGE were part of the ascii 
text, you'd be wrong -- the field names are also encoded, as are the Character and Memo fields.

--
~Ethan~



More information about the Python-list mailing list