utf8 silly question

Jeff Epler jepler at unpythonic.net
Tue Jun 21 14:35:02 EDT 2005


If you want to work with unicode, then write    
    us = u"\N{COPYRIGHT SIGN} some text"
You can also write this as
    us = unichr(169) + u" some text"


When you have a Unicode string, you can convert it to a particular
encoding stored in a byte string with
    bs = us.encode("utf-8")


It's generally a mistake to use the .encode() method on a byte string,
but that's what code like
    bs = "\xa9 some text"
    bs = bs.encode("utf-8")
does.  It can lull you into believing it works, if the test data only
has US ASCII contents, then break when you go into production and have
non-ASCII strings.

Jeff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20050621/b951ee09/attachment.sig>


More information about the Python-list mailing list