Unicode and string conversions
Martin von Loewis
loewis at informatik.hu-berlin.de
Sat Nov 17 06:15:18 EST 2001
zayats at blue.seas.upenn.edu (Salim Zayat) writes:
> For example, let's say I have a string
>
> >>>s = '\u0162'
>
> to begin with.
Where did you get this string from? Why does it have to use \u escapes
to denote non-ASCII characters? Couldn't the string use encodings that
other people use as well (like Latin-1, UTF-8, KOI-8R, etc)?
> >>>us = unicode(s, 'utf-8')
> or even
> >>>us = unicode('\u0162', 'utf-8')
>
> I get back :
>
> >>>u'\\u0162'
>
> Which is unfortunately not the same thing.
It is exactly the same - in UTF-8. Every character (below 128) stands
for itself in UTF-8, so the backslash stands for a backslash, the u
stands for an u, etc - just as it does in the Unicode string.
> I am just a whole lot of confused.
It looks like this. If you absolutely *have* to use \u be treated as
an escape in a byte string, you can use the 'unicode-escape' encoding:
>>> unicode('\u0162','unicode-escape')
u'\u0162'
HTH,
Martin
More information about the Python-list
mailing list