HTMLParser tag contents

Grant Griffin g2 at seebelow.org
Wed May 10 16:31:29 EDT 2000


Paul Prescod wrote:
> 
> Grant Griffin wrote:
> >
> > Therefore, for Python 1.6, I would like to recommend that SGMLParser be
> > modified to provide a method called "get_tag_contents" (or whatever)
> > which can be called at the point of any "end_xxx" to convey the tag's
> > contents (which would include not only text but contained tags and their
> > text.)  (The reason SGMLParser has to be modified is that its index into
> > its "rawdata" array is local to its parser routine.)
> 
> You could be parsing a 100MB HTML/SGML document 1 K at a time. I don't
> think you want SGMLLIB to keep around the entire 100MB "just in case"
> you ask for the contents of the BODY tag.
> 

I guess some sort of size limit could be put in it to cover all but
extreme cases.

Then again, maybe you're right: maybe the solution I had posted was
best. ;-)

when-the-exception-doesn't-prove-the-rule-it-must-prove-the
   -exception-ly y'rs,

=g2
p.s.  BTW, how long do you 'spose it would take for somebody to _read_ a
web page containing 100MB HTML?!  (Lemme see...carry the six...that
would take...waitaminute...about... ;-)
-- 
_____________________________________________________________________

Grant R. Griffin                                       g2 at dspguru.com
Publisher of dspGuru                           http://www.dspguru.com
Iowegian International Corporation	      http://www.iowegian.com



More information about the Python-list mailing list