Treating a unicode string as latin-1
Duncan Booth
duncan.booth at invalid.invalid
Thu Jan 3 10:55:29 EST 2008
Fredrik Lundh <fredrik at pythonware.com> wrote:
> ET has already decoded the CP1252 data for you. If you want UTF-8, all
> you need to do is to encode it:
>
> >>> u'Bob\x92s Breakfast'.encode('utf8')
> 'Bob\xc2\x92s Breakfast'
>
I think he is claiming that the encoding information in the file is
incorrect and therefore it has been decoded incorrectly.
I would think it more likely that he wants to end up with u'Bob\u2019s
Breakfast' rather than u'Bob\x92s Breakfast' although u'Dog\u2019s dinner'
seems a probable consequence.
More information about the Python-list
mailing list