Treating a unicode string as latin-1
Duncan Booth
duncan.booth at invalid.invalid
Thu Jan 3 08:45:59 EST 2008
Simon Willison <simon at simonwillison.net> wrote:
> How can I tell Python "I know this says it's a unicode string, but I
> need you to treat it like a bytestring"?
Can you not just fix your xml file so that it uses the same encoding as it
claims to use? If the xml says it contains utf8 encoded data then it should
not contain cp1252 encoded data, period.
If you really must, then try encoding with latin1 and then decoding with
cp1252:
>>> print u'Bob\x92s Breakfast'.encode('latin1').decode('cp1252')
Bobs Breakfast
The latin1 codec will convert unicode characters in the range 0-255 to the
same single-byte value.
More information about the Python-list
mailing list