creating really big lists

Bruno Desthuilliers bdesth.quelquechose at free.quelquepart.fr
Sun Sep 9 10:53:58 EDT 2007


Dr Mephesto a écrit :
> On Sep 8, 8:06 pm, Bruno Desthuilliers
> <bdesth.quelquech... at free.quelquepart.fr> wrote:
> 
>>Dr Mephesto a écrit :
>>
>>
>>>Hi!
>>
>>>I would like to create a pretty big list of lists; a list 3,000,000
>>>long, each entry containing 5 empty lists. My application will append
>>>data each of the 5 sublists, so they will be of varying lengths (so no
>>>arrays!).
>>
>>>Does anyone know the most efficient way to do this?
>>
>>Hem... Did you consider the fact that RAM is not an unlimited resource?
>>
>>Let's do some simple math (please someone correct me if I'm going off
>>the road): if a Python (empty) list object required 256 bits (if I refer
>>to some old post by GvR, it's probably more - 384 bytes at least. Some
>>Python guru around ?), you'd need (1 + (3000000 * 5)) * 256 bits just to
>>build this list of lists. Which would make something around 3 Gb. Not
>>counting all other needed memory...
>>
>>FWIW, run the following code:
>>
>># eatallramthenswap.py
>>d = {}
>>for i in xrange(3000000):
>>   d[i] = ([], [], [], [], [])
>>
>>And monitor what happens with top...
> 
> 
> Unused ram is wasted ram :)

Indeed. But when your app eats all RAM and swap and brings the system 
down, users are usually a bit unhappy !-)

> I tried using MySQL, and it was to slow.

Possibly.

> and I have 4gb anyway...

*You* have 4gb. Yes, fine. But:

1/ please take time to re-read my post - the 3 gb is based on a very 
optimistic estimation (256 bits) of the size of an empty list. If you 
choose the (probably much closer to reality) estimate of 384 bits, then 
you need (1 + (3000000 * 5)) * 384, which makes =~ 5 gb. More than what 
*you* have. BTW, please remember that your OS and the Python interpreter 
are going to eat some of these 4 gb, and that you intend to actually 
*store* something - objects references - in these lists. Even if you do 
have a few shared references, this means that you'll have to have RAM 
space for *at least* 3000000 * 5 *more* Python objects (which make *1* 
object per list...). Which will *at minima* use about the same amount of 
RAM as the list of lists itself. Which take us to something like 10 
gb... for *1* object per list. I of course suppose you plan to store 
much more than 1 object per list !-)

2/ now ask yourself how many users of your application will have enough 
RAM to run it...

So IMVHO, the question is not how to build such a list in less than x 
minutes, but how to *not* build such a list. IOW, do you *really* need 
to store all that stuff in RAM ?



More information about the Python-list mailing list