Unicode perplex

Fredrik Lundh fredrik at pythonware.com
Thu Jun 24 04:10:17 EDT 2004


John Roth wrote:

> The problem is that I've got a normal string where
> the byte stream is actually UTF-8. How do I turn
> it into a Unicode string? Remember that the trick
> is that it's still going to have the *same* stream of
> bytes (at least if the Unicode string is implemented
> in UTF-8.) I don't need to convert it with a codec,
> I need to change the class under the data.

you're making more assumptions about things you don't know anything
about than is really good for you.  had you read any article on Python's
Unicode system, you'd learned that UTF-8 is an encoding, while Python
Unicode string type contains sequences of Unicode characters.

or in other words, if you have something that isn't a Python Unicode
string, and you want a Python Unicode string, you need to convert it.

more reading:

    http://www.effbot.org/zone/unicode-objects.htm
    http://www.reportlab.com/i18n/python_unicode_tutorial.html
        (slightly outdated; ignore installation/setup parts)
    http://www.egenix.com/files/python/Unicode-EPC2002-Talk.pdf

</F>







More information about the Python-list mailing list