Unicode question : turn "José" into u"José"
Ben Finney
bignose+hates-spam at benfinney.id.au
Wed Apr 5 18:38:12 EDT 2006
"Ian Sparks" <Ian.Sparks at etrials.com> writes:
> This is probably stupid and/or misguided but supposing I'm passed a
> byte-string value that I want to be unicode, this is what I do. I'm
> sure I'm missing something very important.
Perhaps you need to read one of the good Python Unicode tutorials,
such as:
<URL:http://effbot.org/zone/unicode-objects.htm>
> Short version :
>
> >>> s = "José" #Start with non-unicode string
In what encoding? Once you step outside the ASCII character set, you
*must* be explicit about the encoding used for the text. Because there
is no sure way to infer it, Python refuses to guess.
If you're going to include literal non-ASCII characters in the code
(which is the simplest and most readable way), you must also tell
Python what encoding to use when it reads the source file.
<URL:http://docs.python.org/ref/encodings.html>
> >>> unicoded = eval("u'%s'" % "José")
Once you know the encoding, you can simply say::
>>> str_encoding = "iso-8859-1"
>>> str = "José"
>>> unicode_str = str.decode(str_encoding)
(Note that I didn't type this using the iso-8859-1 encoding, so it's
likely to be wrong in that respect; you'll need to change it to match
your situation.)
--
\ "To me, boxing is like a ballet, except there's no music, no |
`\ choreography, and the dancers hit each other." -- Jack Handey |
_o__) |
Ben Finney
More information about the Python-list
mailing list