[Tutor] managing memory large dictionaries in python

Dwight Hutto dwightdhutto at gmail.com
Wed Oct 17 03:30:43 CEST 2012


On Tue, Oct 16, 2012 at 12:57 PM, Abhishek Pratap
<abhishek.vit at gmail.com> wrote:
> Hi Guys
>
> For my problem I need to store 400-800 million 20 characters keys in a
> dictionary and do counting. This data structure takes about 60-100 Gb
> of RAM.
> I am wondering if there are slick ways to map the dictionary to a file
> on disk and not store it in memory but still access it as dictionary
> object. Speed is not the main concern in this problem and persistence
> is not needed as the counting will only be done once on the data. We
> want the script to run on smaller memory machines if possible.
>
> I did think about databases for this but intuitively it looks like a
> overkill coz for each key you have to first check whether it is
> already present and increase the count by 1  and if not then insert
> the key into dbase.
>
> Just want to take your opinion on this.
>
> Thanks!
> -Abhi
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor

My inexperienced advice would be to begin with the storage areas
available. I would begin by eliminating certain things such as:

x = {'one_entry' : 1}

into

x = {'one_entry':1}

To map, you would want maybe different db files that contain certain
info within a certain range. 0-1000 entries in the first file, etc.

os.walk a directory, and find the mapped file in a particular range
file, then go straight to the entry needed.

Make the dict one long line, and you could eliminate any /n newline chars.

 I could do better with more time, but that seems like a good solution
at this point.

Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com


More information about the Tutor mailing list