Unicode script

Terry Reedy tjreedy at udel.edu
Thu Dec 15 16:57:24 EST 2016


On 12/15/2016 1:06 PM, MRAB wrote:
> On 2016-12-15 16:53, Steve D'Aprano wrote:
>> Suppose I have a Unicode character, and I want to determine the script or
>> scripts it belongs to.
>>
>> For example:
>>
>> U+0033 DIGIT THREE "3" belongs to the script "COMMON";
>> U+0061 LATIN SMALL LETTER A "a" belongs to the script "LATIN";
>> U+03BE GREEK SMALL LETTER XI "ξ" belongs to the script "GREEK".
>>
>>
>> Is this information available from Python?
>>
>>
>> More about Unicode scripts:
>>
>> http://www.unicode.org/reports/tr24/
>> http://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt
>> http://www.unicode.org/Public/UCD/latest/ucd/ScriptExtensions.txt
>>
>>
> Interestingly, there's issue 6331 "Add unicode script info to the
> unicode database". Looks like it didn't make it into Python 3.6.

https://bugs.python.org/issue6331
Opened in 2009 with patch and 2 revisions for 2.x.  At least the Python 
code needs to be updated.

Approved in principle by Martin, then unicodedata curator, but no longer 
active.  Neither, very much, are the other 2 listed in the Expert's index.

 From what I could see, both the Python API (there is no doc patch yet) 
and internal implementation need more work.  If I were to get involved, 
I would look at the APIs of PyICU (see Eryk Sun's post) and the 
unicodescript module on PyPI (mention by Pander Musubi, on the issue).

-- 
Terry Jan Reedy





More information about the Python-list mailing list