creating really big lists
Bruno Desthuilliers
bdesth.quelquechose at free.quelquepart.fr
Sun Sep 9 10:53:58 EDT 2007
Dr Mephesto a écrit :
> On Sep 8, 8:06 pm, Bruno Desthuilliers
> <bdesth.quelquech... at free.quelquepart.fr> wrote:
>
>>Dr Mephesto a écrit :
>>
>>
>>>Hi!
>>
>>>I would like to create a pretty big list of lists; a list 3,000,000
>>>long, each entry containing 5 empty lists. My application will append
>>>data each of the 5 sublists, so they will be of varying lengths (so no
>>>arrays!).
>>
>>>Does anyone know the most efficient way to do this?
>>
>>Hem... Did you consider the fact that RAM is not an unlimited resource?
>>
>>Let's do some simple math (please someone correct me if I'm going off
>>the road): if a Python (empty) list object required 256 bits (if I refer
>>to some old post by GvR, it's probably more - 384 bytes at least. Some
>>Python guru around ?), you'd need (1 + (3000000 * 5)) * 256 bits just to
>>build this list of lists. Which would make something around 3 Gb. Not
>>counting all other needed memory...
>>
>>FWIW, run the following code:
>>
>># eatallramthenswap.py
>>d = {}
>>for i in xrange(3000000):
>> d[i] = ([], [], [], [], [])
>>
>>And monitor what happens with top...
>
>
> Unused ram is wasted ram :)
Indeed. But when your app eats all RAM and swap and brings the system
down, users are usually a bit unhappy !-)
> I tried using MySQL, and it was to slow.
Possibly.
> and I have 4gb anyway...
*You* have 4gb. Yes, fine. But:
1/ please take time to re-read my post - the 3 gb is based on a very
optimistic estimation (256 bits) of the size of an empty list. If you
choose the (probably much closer to reality) estimate of 384 bits, then
you need (1 + (3000000 * 5)) * 384, which makes =~ 5 gb. More than what
*you* have. BTW, please remember that your OS and the Python interpreter
are going to eat some of these 4 gb, and that you intend to actually
*store* something - objects references - in these lists. Even if you do
have a few shared references, this means that you'll have to have RAM
space for *at least* 3000000 * 5 *more* Python objects (which make *1*
object per list...). Which will *at minima* use about the same amount of
RAM as the list of lists itself. Which take us to something like 10
gb... for *1* object per list. I of course suppose you plan to store
much more than 1 object per list !-)
2/ now ask yourself how many users of your application will have enough
RAM to run it...
So IMVHO, the question is not how to build such a list in less than x
minutes, but how to *not* build such a list. IOW, do you *really* need
to store all that stuff in RAM ?
More information about the Python-list
mailing list