How to get an encoding a value?

Piet van Oostrum piet at cs.uu.nl
Sat Oct 23 08:41:15 EDT 2004


>>>>> "Diez B. Roggisch" <deets.nospaaam at web.de> (DBR) wrote:

DBR> You are confusing unicode with strings with a certain encoding.

DBR> Unicode is an abstract specification of a huge number of characters,
DBR> hopefully covering even the close-to-unknown glyphs of some ancient
DBR> himalayan mountain tribe to the commonly used latin alphabet. There are no
DBR> actual numeric values associated with that glyphs.

You mix up characters and glyphs which makes it confusing.
There are no numeric values associated with glyphs in Unicode, but there
are numeric values associated with abstract characters. 

(http://www.unicode.org/standard/WhatIsUnicode.html)
Unicode provides a unique number for every character, no matter what the
platform, no matter what the program, no matter what the language.  

These numbers are called `code points'. (It says `unique' above, but later
they relax that).

But you are right regarding the encodings. The Unicode code points can be
encoded in different ways e.g. with the UTF-8 encoding.
-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: P.van.Oostrum at hccnet.nl



More information about the Python-list mailing list