Unicode and string conversions

Werner Schiendl ws-news at gmx.at
Fri Nov 16 13:16:40 EST 2001


Hi,

you can convert (encode) an unicode string to an 8 bit encoding (a string)
with the encode() method of the unicode string object.
The reverse is possible with the builtin function unicode()

e. g.

>>> us=u'\u0621\u0622'
>>> s=us.encode('utf-8')
>>> s
'\xd8\xa1\xd8\xa2'
>>>
>>> nus=unicode(s, 'utf-8')
>>> nus
u'\u0621\u0622'
>>> print unicode.__doc__
unicode(string [, encoding[, errors]]) -> object

Create a new Unicode object from the given encoded string.
encoding defaults to the current default string encoding and
errors, defining the error handling, to 'strict'.
>>> print us.encode.__doc__
S.encode([encoding[,errors]]) -> string

Return an encoded string version of S. Default encoding is the current
default string encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a ValueError. Other possible values are 'ignore' and 'replace'.
>>>

You need to specify which encoding should be used.
The available encodings reside in the package 'codecs' of the Python
distribution.

hth
Werner





More information about the Python-list mailing list