XML and UnicodeError

Diez B. Roggisch deets.nospaaam at web.de
Tue Oct 5 13:07:17 EDT 2004


Just for the record: Don't confuse unicode with utf-8 - the former beeing a
specification of more or less all characters used on this planet, the
latter an actual encoding of these that maps common ascii characters to
their well-known values and has escapes defined to encode all others, like
umlauts.

So

u'some text'

is not UTF-8 - its a unicode object. If you do this:

u'some text'.encode('utf-8')

it becomes a binary string which is encoded using utf-8. Specifying the
coding of the python file using the 

#  -*- encoding: iso-8859-1 -*-

syntax means that <some-text> found in

u'<some-text>'

are interpreted using the latin1-codec - so u'<some-text>' is a shorthand
for

'some-text'.decode('iso-8859-1')

Regards,

diez



More information about the Python-list mailing list