HTMLParser tag contents

Grant Griffin g2 at seebelow.org
Fri May 5 13:42:14 EDT 2000


In article <Pine.LNX.4.21.0005051436470.24093-100000 at fep132.fep.ru>, Oleg
says...
>
>On Sat, 6 May 2000, Grant Griffin wrote:
>> I've been trying to figure out how to use HTMLParser.  My immediate need
>> is to extract the entire <BODY> of a file.  (I could do that with 're',
>> but I'm trying to learn HTMLParser.)  Sure, HTMLParser will returns a
>> tag's _attributes_, but I can't figure out how to get to the tag's
>> _contents_.  Can it do that?
>
>   Do not use HTMLParser - use SGMLParser. HTMLParser is for different
>parsing - more for HTML-to-text conversions...

Perhaps I misspoke.  I agree that the solution would probably have to occur at
the level of SGMLParser, but I guess my question remains: can it do that? if so,
how?

In looking at the SGMLParser source code, it doesn't appear to have any
mechanism to capture the contents of a tag.  I'm kindda new to Python, so am I
missing something here?  If not, it seems like the thing to do is to keep a
little stack of tag begin/end indices into its "rawdata" array, then provide a
routine to extract the data using the tag's begin/end indices.

(even-though-i'm-new-i-already-figured-out-how-to-write-these
   -silly-salutations,-which-lately-i've-been-been-subjecting
   -the-poor-comp.dsp-newsgroup-to-<wink>)-ly

=g2

_____________________________________________________________________

Grant R. Griffin                                       g2 at dspguru.com
Publisher of dspGuru                           http://www.dspguru.com
Iowegian International Corporation            http://www.iowegian.com




More information about the Python-list mailing list