compare unicode to non-unicode strings

John Machin sjmachin at lexicon.net
Sun Aug 31 09:27:34 EDT 2008


On Aug 31, 11:04 pm, Asterix <aste... at lagaule.org> wrote:
> how could I test that those 2 strings are the same:
>
> 'séd' (repr is 's\\xc3\\xa9d')
>
> u'séd' (repr is u's\\xe9d')

[note: your reprs are wrong; change the \\ to \]

You need to decode the non-unicode string and compare the result with
the unicode string. You need to know the encoding used for the non-
unicode string. In the example that you gave, it's about 99.99% likely
that it's UTF-8.

>>> 's\xc3\xa9d'.decode('utf8')
u's\xe9d'
>>> u's\xe9d'.encode('utf8')
's\xc3\xa9d'
>>>

HTH,
John



More information about the Python-list mailing list