Unicode script

eryk sun eryksun at gmail.com
Thu Dec 15 13:01:58 EST 2016


On Thu, Dec 15, 2016 at 4:53 PM, Steve D'Aprano
<steve+python at pearwood.info> wrote:
> Suppose I have a Unicode character, and I want to determine the script or
> scripts it belongs to.
>
> For example:
>
> U+0033 DIGIT THREE "3" belongs to the script "COMMON";
> U+0061 LATIN SMALL LETTER A "a" belongs to the script "LATIN";
> U+03BE GREEK SMALL LETTER XI "ξ" belongs to the script "GREEK".
>
> Is this information available from Python?

Tools/makunicodedata.py doesn't include data from "Scripts.txt". If
adding an external dependency is ok, then you can use PyICU. For
example:

    >>> icu.Script.getScript('\u0033').getName()
    'Common'
    >>> icu.Script.getScript('\u0061').getName()
    'Latin'
    >>> icu.Script.getScript('\u03be').getName()
    'Greek'

There isn't documentation specific to Python, so you'll have to figure
things out experimentally with reference to the C API.

http://icu-project.org/apiref/icu4c
http://icu-project.org/apiref/icu4c/uscript_8h.html



More information about the Python-list mailing list