Unicode/ascii encoding nightmare

Andrea Griffini agriff at tin.it
Mon Nov 6 16:09:49 EST 2006


John Machin wrote:

> The fact that C3 and C2 are both present, plus the fact that one
> non-ASCII byte has morphoploded into 4 bytes indicate a double whammy.

Indeed...

 >>> x = u"fødselsdag"
 >>> x.encode('utf-8').decode('iso-8859-1').encode('utf-8')
'f\xc3\x83\xc2\xb8dselsdag'

Andrea



More information about the Python-list mailing list