sgmlParser infinite loop? How to empty and re-user parser object?

Bernard Yue bernie at 3captus.com
Fri Mar 22 19:17:22 EST 2002


Nick Arnett wrote:
> 
> Anyone know of circumstances under which sgmlParser will hang, presumably in
> an infinite (well, exceeding my patience, anyway) loop?  I don't seem to be
> able to reliably reproduce this, but occasionally during processing of a
> large number of pages, I seem to get stuck in it.  I'm doing very simple
> parsing, basically just extracting the contents of tables.  I'll re-try the
> same set of documents and it'll hang in a different spot.  If it weren't so
> unpredictable and infrequent, I'd dig into it with the debugger...
>

My record is 1300 web pages and I did not get the problem you've
mentioned.

> Still fairly new to Python... I'm wondering if I should be re-using a parser
> object for each document I'm processing in a loop -- and wondering if the
> fact that I'm not is causing these freezes.  But if I call it without
> re-instantiating it, I get the same text parsed again... and I can't see how
> to tell it to not do that.   Calling reset doesn't seem to do the trick,
> even though I seem to have the appropriate reset method that calls the
> parent reset.
> 

I am not reusing the parser object in my script either (but you can). 
You should consider post the program fragment you used.

> Thanks for tips.
> 
> Nick
> --
> narnett at mccmedia.com
> (408) 904-7198


Bernie

-- 
There are three schools of magic.  One:  State a tautology, then ring
the changes on its corollaries; that's philosophy.  Two:  Record many
facts.  Try to find a pattern.  Then make a wrong guess at the next
fact; that's science.  Three:  Be aware that you live in a malevolent
Universe controlled by Murphy's Law, sometimes offset by Brewster's
Factor; that's engineering.
                -- Robert A. Heinlein



More information about the Python-list mailing list