RE Module Performance

Antoon Pardon antoon.pardon at rece.vub.ac.be
Wed Jul 31 04:11:10 EDT 2013


Op 30-07-13 21:09, wxjmfauth at gmail.com schreef:
> Matable, immutable, copyint + xxx, bufferint, O(n) ....
> Yes, but conceptualy the reencoding happen sometime, somewhere.

Which is a far cry from your previous claim that it happened
every time you enter a char.

This of course make your case harder to argue. Because the
impact of something that happens sometime, somewhere is
vastly less than something that happens everytime you enter
a char.

> The internal "ucs-2" will never automagically be transformed
> into "ucs-4" (eg).

It will just start producing wrong results when someone starts
using characters that don't fit into ucs-2.


>>>> timeit.timeit("'a'*10000 +'€'")
> 7.087220684719967
>>>> timeit.timeit("'a'*10000 +'z'")
> 1.5685214234430873
>>>> timeit.timeit("z = 'a'*10000; z = z +'€'")
> 7.169538866162213
>>>> timeit.timeit("z = 'a'*10000; z = z +'z'")
> 1.5815893830557286
>>>> timeit.timeit("z = 'a'*10000; z += 'z'")
> 1.606955741596181
>>>> timeit.timeit("z = 'a'*10000; z += '€'")
> 7.160483334521416
> 
> 
> And do not forget, in a pure utf coding scheme, your
> char or a char will *never* be larger than 4 bytes.
> 
>>>> sys.getsizeof('a')
> 26
>>>> sys.getsizeof('\U000101000')
> 48

Nonsense.

>>> sys.getsizeof('a'.encode('utf-8'))
18







More information about the Python-list mailing list