Optimizing size of very large dictionaries
Miles
semanticist at gmail.com
Wed Jul 30 23:21:15 EDT 2008
On Wed, Jul 30, 2008 at 8:29 PM, <python at bdurham.com> wrote:
> Background: I'm trying to identify duplicate records in very large text
> based transaction logs. I'm detecting duplicate records by creating a SHA1
> checksum of each record and using this checksum as a dictionary key. This
> works great except for several files whose size is such that their
> associated checksum dictionaries are too big for my workstation's 2G of RAM.
What are the values of this dictionary?
You can save memory by representing the checksums as long integers, if
you're currently using strings.
-Miles
More information about the Python-list
mailing list