[Tutor] managing memory large dictionaries in python
Dwight Hutto
dwightdhutto at gmail.com
Wed Oct 17 03:30:43 CEST 2012
On Tue, Oct 16, 2012 at 12:57 PM, Abhishek Pratap
<abhishek.vit at gmail.com> wrote:
> Hi Guys
>
> For my problem I need to store 400-800 million 20 characters keys in a
> dictionary and do counting. This data structure takes about 60-100 Gb
> of RAM.
> I am wondering if there are slick ways to map the dictionary to a file
> on disk and not store it in memory but still access it as dictionary
> object. Speed is not the main concern in this problem and persistence
> is not needed as the counting will only be done once on the data. We
> want the script to run on smaller memory machines if possible.
>
> I did think about databases for this but intuitively it looks like a
> overkill coz for each key you have to first check whether it is
> already present and increase the count by 1 and if not then insert
> the key into dbase.
>
> Just want to take your opinion on this.
>
> Thanks!
> -Abhi
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
My inexperienced advice would be to begin with the storage areas
available. I would begin by eliminating certain things such as:
x = {'one_entry' : 1}
into
x = {'one_entry':1}
To map, you would want maybe different db files that contain certain
info within a certain range. 0-1000 entries in the first file, etc.
os.walk a directory, and find the mapped file in a particular range
file, then go straight to the entry needed.
Make the dict one long line, and you could eliminate any /n newline chars.
I could do better with more time, but that seems like a good solution
at this point.
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com
More information about the Tutor
mailing list