Treating a unicode string as latin-1

Duncan Booth duncan.booth at invalid.invalid
Thu Jan 3 08:45:59 EST 2008


Simon Willison <simon at simonwillison.net> wrote:

> How can I tell Python "I know this says it's a unicode string, but I
> need you to treat it like a bytestring"?

Can you not just fix your xml file so that it uses the same encoding as it 
claims to use? If the xml says it contains utf8 encoded data then it should 
not contain cp1252 encoded data, period.

If you really must, then try encoding with latin1 and then decoding with 
cp1252:

>>> print u'Bob\x92s Breakfast'.encode('latin1').decode('cp1252')
Bob’s Breakfast

The latin1 codec will convert unicode characters in the range 0-255 to the 
same single-byte value.



More information about the Python-list mailing list