Optimizing size of very large dictionaries
M.-A. Lemburg
mal at egenix.com
Thu Jul 31 06:24:57 EDT 2008
On 2008-07-31 02:29, python at bdurham.com wrote:
> Are there any techniques I can use to strip a dictionary data
> structure down to the smallest memory overhead possible?
>
> I'm working on a project where my available RAM is limited to 2G
> and I would like to use very large dictionaries vs. a traditional
> database.
>
> Background: I'm trying to identify duplicate records in very
> large text based transaction logs. I'm detecting duplicate
> records by creating a SHA1 checksum of each record and using this
> checksum as a dictionary key. This works great except for several
> files whose size is such that their associated checksum
> dictionaries are too big for my workstation's 2G of RAM.
If you don't have a problem with taking a small performance hit,
then I'd suggest to have a look at mxBeeBase, which is an on-disk
dictionary implementation:
http://www.egenix.com/products/python/mxBase/mxBeeBase/
Of course, you could also use a database table for this. Together
with a proper index that should work as well (but it's likely slower
than mxBeeBase).
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Jul 31 2008)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
More information about the Python-list
mailing list