Diacretical incensitive search

jmfauth wxjmfauth at gmail.com
Fri May 17 13:31:25 EDT 2013


--------


The handling of diacriticals is especially a nice case
study. One can use it to toy with some specific features of
Unicode, normalisation, decomposition, ...

... and also to show how Unicode can be badly implemented.

First and quick example that came to my mind (Py325 and Py332):

>>> timeit.repeat("ud.normalize('NFKC', ud.normalize('NFKD', 'ᶑḗḖḕḹ'))", "import unicodedata as ud")
[2.929404406789672, 2.923327801150208, 2.923659417064755]

>>> timeit.repeat("ud.normalize('NFKC', ud.normalize('NFKD', 'ᶑḗḖḕḹ'))", "import unicodedata as ud")
[3.8437222586746884, 3.829490737203514, 3.819266963414293]

jmf



More information about the Python-list mailing list