[Tutor] Why do sets use 6 times as much memory as lists?

Oscar Benjamin oscar.j.benjamin at gmail.com
Tue May 7 19:53:33 CEST 2013


On 7 May 2013 18:10, Bjorn Madsen <bjorn.h.madsen at googlemail.com> wrote:
>>>> import sys
>>>> L=[x for x in range(10000)]
>>>> sys.getsizeof(L)
> 43816
>>>> L={x for x in range(10000)}
>>>> sys.getsizeof(L)
> 262260

Firstly, these results may vary depending on operating system,
processor architecture and build options so this won't always be
exactly the same. I do get the same numbers as you, however, on
standard python 2.7 on 32-bit Windows.

So why do sets use extra memory? Under the hood a list is an array
which stores a pointer in each slot with no gaps between the pointers.

A set uses a hash-table which is an array that stores pointers at
randomish positions and only fills about half of its spaces. This
causes it to need twice as much memory for all the gaps in its array.
On top of that Python sets use a resizable hash table and to make
resizing efficient the hash of each object is stored in addition to
its pointer. This essentially requires a whole extra array so it
doubles your storage requirements again. With these two I would expect
a set to be something like 4 times bigger so the 6 times bigger that
you report seems reasonable to me (I may have missed something else in
this).


Oscar


More information about the Tutor mailing list