HTMLParser tag contents
Grant Griffin
g2 at seebelow.org
Fri May 5 13:42:14 EDT 2000
In article <Pine.LNX.4.21.0005051436470.24093-100000 at fep132.fep.ru>, Oleg
says...
>
>On Sat, 6 May 2000, Grant Griffin wrote:
>> I've been trying to figure out how to use HTMLParser. My immediate need
>> is to extract the entire <BODY> of a file. (I could do that with 're',
>> but I'm trying to learn HTMLParser.) Sure, HTMLParser will returns a
>> tag's _attributes_, but I can't figure out how to get to the tag's
>> _contents_. Can it do that?
>
> Do not use HTMLParser - use SGMLParser. HTMLParser is for different
>parsing - more for HTML-to-text conversions...
Perhaps I misspoke. I agree that the solution would probably have to occur at
the level of SGMLParser, but I guess my question remains: can it do that? if so,
how?
In looking at the SGMLParser source code, it doesn't appear to have any
mechanism to capture the contents of a tag. I'm kindda new to Python, so am I
missing something here? If not, it seems like the thing to do is to keep a
little stack of tag begin/end indices into its "rawdata" array, then provide a
routine to extract the data using the tag's begin/end indices.
(even-though-i'm-new-i-already-figured-out-how-to-write-these
-silly-salutations,-which-lately-i've-been-been-subjecting
-the-poor-comp.dsp-newsgroup-to-<wink>)-ly
=g2
_____________________________________________________________________
Grant R. Griffin g2 at dspguru.com
Publisher of dspGuru http://www.dspguru.com
Iowegian International Corporation http://www.iowegian.com
More information about the Python-list
mailing list