Unicode and string conversions
Werner Schiendl
ws-news at gmx.at
Fri Nov 16 13:16:40 EST 2001
Hi,
you can convert (encode) an unicode string to an 8 bit encoding (a string)
with the encode() method of the unicode string object.
The reverse is possible with the builtin function unicode()
e. g.
>>> us=u'\u0621\u0622'
>>> s=us.encode('utf-8')
>>> s
'\xd8\xa1\xd8\xa2'
>>>
>>> nus=unicode(s, 'utf-8')
>>> nus
u'\u0621\u0622'
>>> print unicode.__doc__
unicode(string [, encoding[, errors]]) -> object
Create a new Unicode object from the given encoded string.
encoding defaults to the current default string encoding and
errors, defining the error handling, to 'strict'.
>>> print us.encode.__doc__
S.encode([encoding[,errors]]) -> string
Return an encoded string version of S. Default encoding is the current
default string encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a ValueError. Other possible values are 'ignore' and 'replace'.
>>>
You need to specify which encoding should be used.
The available encodings reside in the package 'codecs' of the Python
distribution.
hth
Werner
More information about the Python-list
mailing list