RE Module Performance

Chris Angelico rosuav at gmail.com
Thu Jul 25 15:18:44 EDT 2013


On Fri, Jul 26, 2013 at 5:07 AM,  <wxjmfauth at gmail.com> wrote:
> Let start with a simple string \textemdash or \texttendash
>
>>>> sys.getsizeof('–')
> 40
>>>> sys.getsizeof('a')
> 26

Most of the cost is in those two apostrophes, look:

>>> sys.getsizeof('a')
26
>>> sys.getsizeof(a)
8

Okay, that's slightly unfair (bonus points: figure out what I did to
make this work; there are at least two right answers) but still, look
at what an empty string costs:

>>> sys.getsizeof('')
25

Or look at the difference between one of these characters and two:

>>> sys.getsizeof('aa')-sys.getsizeof('a')
1
>>> sys.getsizeof('––')-sys.getsizeof('–')
2

That's what the characters really cost. The overhead is fixed. It is,
in fact, almost completely insignificant. The storage requirement for
a non-ASCII, BMP-only string converges to two bytes per character.

ChrisA



More information about the Python-list mailing list