RE Module Performance

Sun Jul 28 04:45:25 EDT 2013

Op 27-07-13 20:21, wxjmfauth at gmail.com schreef:

> Quickly. sys.getsizeof() at the light of what I explained.
>
> 1) As this FSR works with multiple encoding, it has to keep
> track of the encoding. it puts is in the overhead of str
> class (overhead = real overhead + encoding). In such
> a absurd way, that a
>
>>>> sys.getsizeof('€')
> 40
>
> needs 14 bytes more than a
>
>>>> sys.getsizeof('z')
> 26
>
> You may vary the length of the str. The problem is
> still here. Not bad for a coding scheme.
>
> 2) Take a look at this. Get rid of the overhead.
>
>>>> sys.getsizeof('b'*1000000 + 'c')
> 1000026
>>>> sys.getsizeof('b'*1000000 + '€')
> 2000040
>
> What does it mean? It means that Python has to
> reencode a str every time it is necessary because
> it works with multiple codings.

So? The same effect can be seen with other datatypes.

 >>> nr = 32767
 >>> sys.getsizeof(nr)
14
 >>> nr += 1
 >>> sys.getsizeof(nr)
16

>
> This FSR is not even a copy of the utf-8.
>>>> len(('b'*1000000 + '€').encode('utf-8'))
> 1000003

Why should it be? Why should a unicode string be a copy
of its utf-8 encoding? That makes as much sense as expecting
that a number would be a copy of its string reprensentation.

>
> utf-8 or any (utf) never need and never spend their time
> in reencoding.

So? That python sometimes needs to do some kind of background
processing is not a problem, whether it is garbage collection,
allocating more memory, shufling around data blocks or reencoding a
string, that doesn't matter. If you've got a real world example where
one of those things noticeably slows your program down or makes the
program behave faulty then you have something that is worthy of
attention.

Until then you are merely harboring a pet peeve.

-- 
Antoon Pardon