Dictionary/Hash question

Sick Monkey sickcodemonkey at gmail.com
Tue Feb 6 22:28:31 EST 2007


qualm after qualm.  Before you read this, my OS is Linux, up2date, and
minimal RAM (512).
On purpose becuase I want this app to run on anything.

I have 2 very good solutions  to this problem (AND I WANT TO THANK 'Gabriel
Genellina' AND 'Don Morrison' with comparing 2 LARGE files).
(LARGE means anywhere from 2MB to 800MB)

The files that my script needs to read in and interpret can contain anywhere
from 5 million lines to 65 million lines

I have attached 2 versions of code for you to analyze.
=================
I am having issues with performance.

Instance 1:  dict_compare.py {which is attached}
Is awesome, in that I have read a file and stored it into a hash table, but
if you run it, the program decides to stall after writing all of the date.
<NOTE:  once you receive the statement "finished comparing 2 lists." the
file has actually finished processing within 1 minute, but the script
continues to run for additional minutes (10 additional minutes actually).
<I dont know why>

Instance 2: dictNew.py
Runs great but it is a little slower than Instance 1 (dict_compare.py).  BUT
WHEN IT FINISHES, IT STOPS THE APPLICATION.... no  additional minutes.....
<NOTE: I was not yelling with the capitalization, but I am frustrated>

Can anyone tell me why Intance1 takes so long to finish?  I looooove both
methods, but I cannot understand the timeframe differences.

HELP!!!!!!!!!!!
========================
Output Test1:
[user at SickCodeMonkey hash]# date
Tue Feb  6 21:23:52 EST 2007
[user at SickCodeMonkey hash]# python dict_compare.py
date
starting list 2
finished storing information in lists.
storing File1 in dictionary.
finished comparing 2 lists.
Stopped processing
done
[user at SickCodeMonkey hash]# date
Tue Feb  6 21:36:14 EST 2007
Total:   Over 10 minutes
------------------------------------
Output Test2:
Tue Feb  6 21:38:55 EST 2007
[user at SickCodeMonkey hash]# python dictNew.py
date
finished comparing 2 lists.
Stopped processing
done
[user at SickCodeMonkey hash]# date
Tue Feb  6 21:40:36 EST 2007
Total: Less than 2 minutes

On 2/6/07, Gabriel Genellina <gagsl-py at yahoo.com.ar> wrote:
>
> En Tue, 06 Feb 2007 22:18:07 -0300, Sick Monkey <sickcodemonkey at gmail.com>
> escribió:
>
> > I have never seen this "with open(fname,'r') as finput:"
> >
> > It is actually throwing an error .  Do I have to import a special
> > library to
> > use this?
> >
> >  File "dictNew.py", line 23
> >     with open(fname,'r') as finput:
> >             ^
> > SyntaxError: invalid syntax
>
> Oh, sorry. You need two things:
> - Python 2.5
> - include this line at the very beginning of your script: from __future__
> import with_statement
>
> If you're using an earlier version, you can write:
>
>    finput = open(fname,'r')
>    try
>      ...
>    finally
>      finput.close()
>
> (Or just omit the try/finally and rely on the garbage collector, but it's
> not the recommended practice, specially when external resources are
> involved, like files).
>
> --
> Gabriel Genellina
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070206/bb1c43c1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dictNew.py
Type: application/x-python
Size: 2502 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20070206/bb1c43c1/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dict_compare.py
Type: application/x-python
Size: 2691 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20070206/bb1c43c1/attachment-0001.bin>


More information about the Python-list mailing list