[Tutor] managing memory large dictionaries in python

Abhishek Pratap abhishek.vit at gmail.com
Tue Oct 16 18:57:24 CEST 2012


Hi Guys

For my problem I need to store 400-800 million 20 characters keys in a
dictionary and do counting. This data structure takes about 60-100 Gb
of RAM.
I am wondering if there are slick ways to map the dictionary to a file
on disk and not store it in memory but still access it as dictionary
object. Speed is not the main concern in this problem and persistence
is not needed as the counting will only be done once on the data. We
want the script to run on smaller memory machines if possible.

I did think about databases for this but intuitively it looks like a
overkill coz for each key you have to first check whether it is
already present and increase the count by 1  and if not then insert
the key into dbase.

Just want to take your opinion on this.

Thanks!
-Abhi


More information about the Tutor mailing list