Unicode in Python

Tue Apr 22 11:28:32 EDT 2014

Le mardi 22 avril 2014 14:21:40 UTC+2, Steven D'Aprano a écrit :
> On Tue, 22 Apr 2014 02:07:58 -0700, wxjmfauth wrote:
> 
> 
> 
> > Le mardi 22 avril 2014 08:30:45 UTC+2, Rustom Mody a écrit :
> 
> >> 
> 
> >> 
> 
> >> 
> 
> >> 
> 
> > @ rusy
> 
> > 
> 
> >> "Ive reworded it to make it clear that I am referring to the
> 
> > character-sets and not encodings."
> 
> > 
> 
> > Very good, excellent, comment. An healthy coding scheme can only work
> 
> > properly with a unique characters set and the coding is achieved with
> 
> > the help of a unique operator. There is no other way to do it and that's
> 
> > the reason why we have to live today with all these coding schemes
> 
> > (unicode or not). Note: A coding scheme can be much more complex than
> 
> > the coding of "raw" characters (eg. CID fonts).
> 
> >> "So instead of using λ (0x3bb) we should use  𝝀 (0x1d740)  or 
> 
> >> something thereabouts like 𝜆"
> 
> 
> 
> For those who cannot see them, they are:
> 
> 
> 
> py> unicodedata.name('\U0001d740')
> 
> 'MATHEMATICAL BOLD ITALIC SMALL LAMDA'
> 
> py> unicodedata.name('\U0001d706')
> 
> 'MATHEMATICAL ITALIC SMALL LAMDA'
> 
> 
> 
> 
> 
> ("LAMDA" is the official Unicode name for Lambda.)
> 
> 
> 
>  
> 
> > This is a very good understanding of unicode. The letter lambda is not
> 
> > the mathematical symbole lambda. Another example, the micro sign is not
> 
> > the greek letter mu which is not the mathematical mu. 
> 
> 
> 
> Depends what you mean by "is not". The micro sign is a legacy 
> 
> compatibility character, we shouldn't use it except for compatibility 
> 
> with legacy (non-Unicode) character sets. Instead, we should use the NFKC 
> 
> or NFKD normalization forms to convert it to the recommended character.
> 
> 
> 
> 
> 
> py> import unicodedata
> 
> py> a = '\N{GREEK SMALL LETTER MU}'  # Preferred
> 
> py> b = '\N{MICRO SIGN}'  # Legacy
> 
> py> a == b
> 
> False
> 
> py> unicodedata.normalize('NFKD', b) == a
> 
> True
> 
> py> unicodedata.normalize('NFKC', b) == a
> 
> True
> 
> 
> 
> As for the mathematical mu, there is no separate Unicode "maths symbol 
> 
> mu" so far as I am aware. One would simply use '\N{MICRO SIGN}' or 
> 
> '\N{GREEK SMALL LETTER MU}' to get a μ.
> 
> 
> 
> Likewise, the λ used in mathematics is the Greek letter λ, not a separate 
> 
> symbol, just like the Latin letter x and the x used in mathematics are 
> 
> the same.
> 
> 

Normalization is working fine, but it proofs nothing, it
has to use some convention.

There are several code points ranges (latin + greek), which can
be used for mathematical purpose (different mu's).

If you are interested, search for "unimath-symbols.pdf"
on CTAN (I have all this stuff on my hd).

...
"Likewise, the λ used in mathematics is the Greek letter λ, not a separate
symbol, just like the Latin letter x and the x used in mathematics are
the same. "... just like the Latin letter x and the x used in mathematics
are the same.
...

Oh! Definitively not. A tool with an unicode engine able to
produce "math text" will certainly not use the same code point
for a "textual x" or for a "mathematical x", even if one
enter/type/hit the same "x".

To be exaggeratedly stict, the real question is to know
if a used "lambda" or "x" belongs to a "math unicode range"
or not. This is quite a different approach. (Please no
confusion with a "text litteral variable x").

A text processing tool will notice the difference, it will
use different fonts.

jmf