Treating a unicode string as latin-1
Fredrik Lundh
fredrik at pythonware.com
Thu Jan 3 10:21:50 EST 2008
Simon Willison wrote:
> But ElementTree gives me back a unicode string, so I get the following
> error:
>
>>>> print u'Bob\x92s Breakfast'.decode('cp1252').encode('utf8')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/encodings/cp1252.py", line 15, in decode
> return codecs.charmap_decode(input,errors,decoding_table)
> UnicodeEncodeError: 'ascii' codec can't encode character u'\x92' in
> position 3: ordinal not in range(128)
>
> How can I tell Python "I know this says it's a unicode string, but I
> need you to treat it like a bytestring"?
ET has already decoded the CP1252 data for you. If you want UTF-8, all
you need to do is to encode it:
>>> u'Bob\x92s Breakfast'.encode('utf8')
'Bob\xc2\x92s Breakfast'
</F>
More information about the Python-list
mailing list