Treating a unicode string as latin-1

Thu Jan 3 10:21:50 EST 2008

Simon Willison wrote:

> But ElementTree gives me back a unicode string, so I get the following
> error:
> 
>>>> print u'Bob\x92s Breakfast'.decode('cp1252').encode('utf8')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
> python2.5/encodings/cp1252.py", line 15, in decode
>     return codecs.charmap_decode(input,errors,decoding_table)
> UnicodeEncodeError: 'ascii' codec can't encode character u'\x92' in
> position 3: ordinal not in range(128)
> 
> How can I tell Python "I know this says it's a unicode string, but I
> need you to treat it like a bytestring"?

ET has already decoded the CP1252 data for you.  If you want UTF-8, all 
you need to do is to encode it:

 >>> u'Bob\x92s Breakfast'.encode('utf8')
'Bob\xc2\x92s Breakfast'

</F>