Error with long running web spider

Nick Craig-Wood nick at craig-wood.com
Wed Aug 22 17:30:06 EDT 2007


Josh Volz <jdvolz at gmail.com> wrote:
>  I have a spider that is relatively long running (somewhere between
>  12-24 hours).  My problem is that I keep having an issue where the
>  program appears to freeze.  Once this freezing happens the activity of
>  the program drops to zero.  No exception is thrown or caught.  The
>  program simply stops doing anything.  It even stops printing out its
>  activity to stdout.  The program itself appears to run in about 14
>  megs of memory.  Basically, the program looks up pages on a particular
>  website, and then reads the HTML of those pages, parses it (lots of
>  long regular expressions are used), and saves the found information to
>  an object (which is later translated to SQL and the SQL is written to
>  a file).
> 
>  I've actually had this same problem with several long running Python
>  programs.  Any ideas?

If you were running under unix I'd suggest you "strace" the process to
see what it is doing.  There are windwows strace programs (which I've
never tried) too!

You'll probably find it is wedged in TCP socket code.

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list