[Python-ideas] π = math.pi

Thomas Jollans tjol at tjol.eu
Sat Jun 3 16:32:28 EDT 2017


On 03/06/17 21:02, Thomas Jollans wrote:
> On 03/06/17 20:41, Chris Angelico wrote:
>> [snip]
>> For reference, as well as the 948 Sm, there are 1690 Mn and 5777 So,
>> but only these characters are valid from them:
>>
>> \u1885 Mn MONGOLIAN LETTER ALI GALI BALUDA
>> \u1886 Mn MONGOLIAN LETTER ALI GALI THREE BALUDA
>> ℘ Sm SCRIPT CAPITAL P
>> ℮ So ESTIMATED SYMBOL
>>
>> 2118 SCRIPT CAPITAL P and 212E ESTIMATED SYMBOL are listed in
>> PropList.txt as Other_ID_Start, so they make sense. But that doesn't
>> explain the two characters from category Mn. It also doesn't explain
>> why U+309B and U+309C are *not* valid, despite being declared
>> Other_ID_Start. Maybe it's a bug? Maybe 309B and 309C somehow got
>> switched into 1885 and 1886??
> \u1885 and \u1886 are categorised as letters (category Lo) by my Python
> 3.5. (Which makes sense, right?) If your system puts them in category
> Mn, that's bound to be a bug somewhere.

Actually it turns out that these characters were changed to category Mn
in Unicode 9.0, but remain in (X)ID_Start for compatibility. All is
right with the world. (All of this just goes to show how much subtlety
there is in the science that goes into making Unicode)

See: http://www.unicode.org/reports/tr44/tr44-18.html#Unicode_9.0.0


>
> As for \u309B and \u309C - it turns out this is a question of
> normalisation. PEP 3131 requires NFKC normalisation:
>
>>>> for c in unicodedata.normalize('NFKC', '\u309B'):
> ...     print('%s\tU+%04X\t%s' % (c, ord(c), unicodedata.name(c)))
> ...
>      U+0020    SPACE
>     U+3099    COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK
>>>> for c in unicodedata.normalize('NFKC', '\u309C'):
> ...     print('%s\tU+%04X\t%s' % (c, ord(c), unicodedata.name(c)))
> ...
>      U+0020    SPACE
>     U+309A    COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
> This is.... interesting.
>
>
> Thomas
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/


-- 
Thomas Jollans

m ☎ +31 6 42630259
e ✉ tjol at tjol.eu



More information about the Python-ideas mailing list