what encoding is this? How can I tell? How can I translate?
Martin von Loewis
loewis at informatik.hu-berlin.de
Tue Sep 25 09:07:27 EDT 2001
Skip Montanaro <skip at pobox.com> writes:
> UnicodeError: Latin-1 encoding error: ordinal not in range(256)
>
> which seemed odd, because the ordinal 213 character is the only character
> above ordinal 127.
If this is indeed mac-latin2, then you have
>>> unicode(chr(213),'mac-latin2')
u'\u2019'
So this character is clearly out-of-range for Latin-1. Instead, it is
>>> u'\N{RIGHT SINGLE QUOTATION MARK}'
u'\u2019'
Please have a look at
http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
which elaborates on the different quotation characters in Unicode. If
you have to convert the left and right single quotation marks to
Latin-1 (which doesn't directly support them), Markus Kuhn suggests to
use APOSTROPHE. So you should either translate all U+2019 to U+0027,
or write a codec that does that for you (the charmap_encode function
will come handy).
However, if feasible at all, I'd recommend to convert the message into
UTF-8 instead, thus preserving full information, while at the same
time increasing the chance that a recipient can make sense out of the
data.
Hope this helps,
Matyin
More information about the Python-list
mailing list