Dictionary/Hash question
Gabriel Genellina
gagsl-py at yahoo.com.ar
Tue Feb 6 23:14:34 EST 2007
En Wed, 07 Feb 2007 00:28:31 -0300, Sick Monkey <sickcodemonkey at gmail.com>
escribió:
> qualm after qualm. Before you read this, my OS is Linux, up2date, and
> minimal RAM (512).
And Python 2.3 or earlier, I presume, else you would have the builtin set
type.
> The files that my script needs to read in and interpret can contain
> anywhere
> from 5 million lines to 65 million lines
>
> I have attached 2 versions of code for you to analyze.
> =================
> I am having issues with performance.
>
> Instance 1: dict_compare.py {which is attached}
> Is awesome, in that I have read a file and stored it into a hash table,
> but
> if you run it, the program decides to stall after writing all of the
> date.
> <NOTE: once you receive the statement "finished comparing 2 lists." the
> file has actually finished processing within 1 minute, but the script
> continues to run for additional minutes (10 additional minutes actually).
> <I dont know why>
This version reads both files FULLY into memory; maybe the delay time you
see, is the deallocation of those two huge lists.
> Instance 2: dictNew.py
> Runs great but it is a little slower than Instance 1 (dict_compare.py).
> BUT
> WHEN IT FINISHES, IT STOPS THE APPLICATION.... no additional
> minutes.....
> <NOTE: I was not yelling with the capitalization, but I am frustrated>
This version processes both files one line at a time, so the memory
requirements are a lot lower.
I think it's a bit slower because the Set class is implemented in Python;
set (Python 2.4) is a builtin type now.
You could combine both versions: use the dict approach from version 1, and
process one line at a time as in version 2.
You can get the mails in both dictionaries like this:
for key in dict1:
if key in dict2:
print key
--
Gabriel Genellina
More information about the Python-list
mailing list