Everything you did not want to know about Unicode in Python 3

Johannes Bauer dfnsonfsduifb at gmx.de
Tue May 13 05:46:17 EDT 2014


On 13.05.2014 10:25, Marko Rauhamaa wrote:

> Based on my background (network and system programming), I'm a bit
> suspicious of strings, that is, text. For example, is the stuff that
> goes to syslog bytes or text? Does an XML file contain bytes or
> (encoded) text? The answers are not obvious to me. Modern computing is
> full of ASCII-esque binary communication standards and formats.

Traditional Unix programs (syslog for example) are notorious for being
clear, ambiguous and/or ignorant of character encodings altogether. And
this works, unfortunately, for the most time because many encodings
share a common subset. If they wouldn't, the problems would be VERY
apparent and people would be forced to handle the issues not so sloppily.

Which is the route that Py3 chose. Don't be sloppy, make a great
distinction between "text" (which handles naturally as strings) and its
respective encoding.

The only people who are angered by this now is people who always treated
encodings sloppily and it "just worked". Well, there's a good chance it
has worked by pure chance so far. It's a good thing that Python does
this now more strictly as it gives developers *guarantees* about what
they can and cannot do with text datatypes without having to deal with
encoding issues in many places. Just one place: The interface where text
is read or written, just as it should be.

Regards,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1 at speranza.aioe.org>



More information about the Python-list mailing list