Unicode question : turn "José" into u"José"

Ben Finney bignose+hates-spam at benfinney.id.au
Wed Apr 5 18:38:12 EDT 2006


"Ian Sparks" <Ian.Sparks at etrials.com> writes:

> This is probably stupid and/or misguided but supposing I'm passed a
> byte-string value that I want to be unicode, this is what I do. I'm
> sure I'm missing something very important.

Perhaps you need to read one of the good Python Unicode tutorials,
such as:

    <URL:http://effbot.org/zone/unicode-objects.htm>

> Short version :
> 
> >>> s = "José" #Start with non-unicode string

In what encoding? Once you step outside the ASCII character set, you
*must* be explicit about the encoding used for the text. Because there
is no sure way to infer it, Python refuses to guess.

If you're going to include literal non-ASCII characters in the code
(which is the simplest and most readable way), you must also tell
Python what encoding to use when it reads the source file.

    <URL:http://docs.python.org/ref/encodings.html>

> >>> unicoded = eval("u'%s'" % "José")

Once you know the encoding, you can simply say::

    >>> str_encoding = "iso-8859-1"
    >>> str = "José"
    >>> unicode_str = str.decode(str_encoding)

(Note that I didn't type this using the iso-8859-1 encoding, so it's
likely to be wrong in that respect; you'll need to change it to match
your situation.)

-- 
 \        "To me, boxing is like a ballet, except there's no music, no |
  `\    choreography, and the dancers hit each other."  -- Jack Handey |
_o__)                                                                  |
Ben Finney




More information about the Python-list mailing list