what encoding is this? How can I tell? How can I translate?

Martin von Loewis loewis at informatik.hu-berlin.de
Tue Sep 25 09:07:27 EDT 2001


Skip Montanaro <skip at pobox.com> writes:

>     UnicodeError: Latin-1 encoding error: ordinal not in range(256)
> 
> which seemed odd, because the ordinal 213 character is the only character
> above ordinal 127.

If this is indeed mac-latin2, then you have

>>> unicode(chr(213),'mac-latin2')
u'\u2019'

So this character is clearly out-of-range for Latin-1. Instead, it is

>>> u'\N{RIGHT SINGLE QUOTATION MARK}'
u'\u2019'

Please have a look at

http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

which elaborates on the different quotation characters in Unicode. If
you have to convert the left and right single quotation marks to
Latin-1 (which doesn't directly support them), Markus Kuhn suggests to
use APOSTROPHE. So you should either translate all U+2019 to U+0027,
or write a codec that does that for you (the charmap_encode function
will come handy).

However, if feasible at all, I'd recommend to convert the message into
UTF-8 instead, thus preserving full information, while at the same
time increasing the chance that a recipient can make sense out of the
data.

Hope this helps,
Matyin



More information about the Python-list mailing list