XML and UnicodeError
Diez B. Roggisch
deets.nospaaam at web.de
Tue Oct 5 13:07:17 EDT 2004
Just for the record: Don't confuse unicode with utf-8 - the former beeing a
specification of more or less all characters used on this planet, the
latter an actual encoding of these that maps common ascii characters to
their well-known values and has escapes defined to encode all others, like
umlauts.
So
u'some text'
is not UTF-8 - its a unicode object. If you do this:
u'some text'.encode('utf-8')
it becomes a binary string which is encoded using utf-8. Specifying the
coding of the python file using the
# -*- encoding: iso-8859-1 -*-
syntax means that <some-text> found in
u'<some-text>'
are interpreted using the latin1-codec - so u'<some-text>' is a shorthand
for
'some-text'.decode('iso-8859-1')
Regards,
diez
More information about the Python-list
mailing list