HTMLParser tag contents

Paul Prescod paul at prescod.net
Tue May 9 16:25:49 EDT 2000


Grant Griffin wrote:
> 
> Therefore, for Python 1.6, I would like to recommend that SGMLParser be
> modified to provide a method called "get_tag_contents" (or whatever)
> which can be called at the point of any "end_xxx" to convey the tag's
> contents (which would include not only text but contained tags and their
> text.)  (The reason SGMLParser has to be modified is that its index into
> its "rawdata" array is local to its parser routine.)

You could be parsing a 100MB HTML/SGML document 1 K at a time. I don't
think you want SGMLLIB to keep around the entire 100MB "just in case"
you ask for the contents of the BODY tag.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Art is always at peril in universities, where there are so many people, 
young and old, who love art less than argument, and dote upon a text 
that provides the nutritious pemmican on which scholars love to chew. 
				-- Robertson Davies in "The Cunning Man"




More information about the Python-list mailing list