Error with long running web spider

Josh Volz jdvolz at gmail.com
Wed Aug 22 13:58:16 EDT 2007


Hi everyone:

I have a spider that is relatively long running (somewhere between
12-24 hours).  My problem is that I keep having an issue where the
program appears to freeze.  Once this freezing happens the activity of
the program drops to zero.  No exception is thrown or caught.  The
program simply stops doing anything.  It even stops printing out its
activity to stdout.  The program itself appears to run in about 14
megs of memory.  Basically, the program looks up pages on a particular
website, and then reads the HTML of those pages, parses it (lots of
long regular expressions are used), and saves the found information to
an object (which is later translated to SQL and the SQL is written to
a file).

I've actually had this same problem with several long running Python
programs.  Any ideas?

Thanks in advance.




More information about the Python-list mailing list