How do I display unicode value stored in a string variable using ord()

Terry Reedy tjreedy at udel.edu
Sun Aug 19 17:59:43 EDT 2012


On 8/19/2012 2:11 PM, wxjmfauth at gmail.com wrote:

> Well, it seems some software producers know what they
> are doing.
>
>>>> '€'.encode('cp1252')
> b'\x80'
>>>> '€'.encode('mac-roman')
> b'\xdb'
>>>> '€'.encode('iso-8859-1')
> Traceback (most recent call last):
>    File "<eta last command>", line 1, in <module>
> UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac'
> in position 0: ordinal not in range(256)

Yes, Python lets you choose your byte encoding from those and a hundred 
others. I believe all the codecs are now tested in both directions. It 
was not an easy task.

As to the examples: Latin-1 dates to 1985 and before and the 1988 
version was published as a standard in 1992.
https://en.wikipedia.org/wiki/Latin-1
"The name euro was officially adopted on 16 December 1995."
https://en.wikipedia.org/wiki/Euro
No wonder Latin-1 does not contain the Euro sign. International 
standards organizations standards are relatively fixed. (The unicode 
consortium will not even correct misspelled character names.) Instead, 
new standards with a new number are adopted.

For better or worse, private mappings are more flexible. In its Mac 
mapping Apple "replaced the generic currency sign ¤ with the euro sign 
€". (See Latin-1 reference.) Great if you use Euros, not so great if you 
were using the previous sign for something else.

Microsoft changed an unneeded code to the Euro for Windows cp-1252.
https://en.wikipedia.org/wiki/Windows-1252
"It is very common to mislabel Windows-1252 text with the charset label 
ISO-8859-1. A common result was that all the quotes and apostrophes 
(produced by "smart quotes" in Microsoft software) were replaced with 
question marks or boxes on non-Windows operating systems, making text 
difficult to read. Most modern web browsers and e-mail clients treat the 
MIME charset ISO-8859-1 as Windows-1252 in order to accommodate such 
mislabeling. This is now standard behavior in the draft HTML 5 
specification, which requires that documents advertised as ISO-8859-1 
actually be parsed with the Windows-1252 encoding.[1]"

Lots of fun. Too bad Microsoft won't push utf-8 so we can all 
communicate text with much less chance of ambiguity.

-- 
Terry Jan Reedy





More information about the Python-list mailing list