Bytes indexing returns an int

Wed Jan 8 12:19:03 EST 2014

On 1/8/14 11:08 AM, wxjmfauth at gmail.com wrote:
> Byte strings (encoded code points) or native unicode is one
> thing.
>
> But on the other side, the problem is elsewhere. These very
> talented ascii narrow minded, unicode illiterate devs only
> succeded to produce this (I, really, do not wish to be rude).

If you don't want to be rude, you are failing.  You've been told a 
number of times that your obscure micro-benchmarks are meaningless.  Now 
you've taken to calling the core devs narrow-minded and Unicode 
illiterate.  They are neither of these things.

Continuing to post these comments with no interest in learning is rude. 
Other recent threads have contained details rebuttals of your views, 
which you have ignored.  This is rude. Please stop.

--Ned.

>
>>>> import unicodedata
>>>> unicodedata.name('ǟ')
> 'LATIN SMALL LETTER A WITH DIAERESIS AND MACRON'
>>>> sys.getsizeof('a')
> 26
>>>> sys.getsizeof('ǟ')
> 40
>>>> timeit.timeit("unicodedata.normalize('NFKD', 'ǟ')", "import unicodedata")
> 0.8040018888575129
>>>> timeit.timeit("unicodedata.normalize('NFKD', 'zzz')", "import unicodedata")
> 0.3073749330963995
>>>> timeit.timeit("unicodedata.normalize('NFKD', 'z')", "import unicodedata")
> 0.2874013282653962
>>>>
>>>> timeit.timeit("len(unicodedata.normalize('NFKD', 'zzz'))", "import unicodedata")
> 0.3803570633857589
>>>> timeit.timeit("len(unicodedata.normalize('NFKD', 'ǟ'))", "import unicodedata")
> 0.9359970320201683
>
> pdf, typography, linguistic, scripts, ... in mind, in other word the real
> *unicode* world.
>
> jmf
>

-- 
Ned Batchelder, http://nedbatchelder.com