Everything you did not want to know about Unicode in Python 3

Skip Montanaro skip at pobox.com
Tue May 13 10:02:29 EDT 2014


On Tue, May 13, 2014 at 3:38 AM, Chris Angelico <rosuav at gmail.com> wrote:
>> Python 2's ambiguity allows me not to answer the tough philosophical
>> questions. I'm not saying it's necessarily a good thing, but it has its
>> benefits.
>
> It's not a good thing. It means that you have the convenience of
> pretending there's no problem, which means you don't notice trouble
> until something happens... and then, in all probability, your app is
> in production and you have no idea why stuff went wrong.

BITD, when I still maintained and developed Musi-Cal (an early online
concert calendar, long since gone), I faced a challenge when I first
started encountering non-ASCII band names and cities. I resisted UTF-8.
After all, if I printed a string containing an "é", it came out looking like



What kind of mess was that???

I tried to ignore it, or assume Latin-1 would cover all the bases (my first
non-ASCII inputs tended to come from Western Europe). If nothing else, at
least "é" was legible.

Needless to say, those approaches didn't work well. After perhaps six
months or a year, I broke down and started converting everything coming in
​ or going out​
to UTF-8 at the boundaries of my system (making educated guesses at
​input
 encodings if necessary). My life got a whole lot easier after that. The
distinction between bytes and text didn't really matter much, certainly not
compared to the mess I had before where strings of unknown data leaked into
my system and its database.

Skip

​P.S. My apologies for the mess this message probably is. Amazing as it may
seem, Gmail in Chrome does a crappy job editing anything other than plain
text. Also, I'm surprised in this day and age that common tools like Gnome
Terminal have little or no encoding support. I wound up having to pop up
urxvt to get an encodings-flexible terminal emulator...​
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20140513/7343b4dd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: e.png
Type: image/png
Size: 270 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20140513/7343b4dd/attachment.png>


More information about the Python-list mailing list