Subclass SGMLParser or HTMLParser?

Sam Peterson skpeternospam at ucdavis.edu
Mon Sep 30 00:19:39 EDT 2002


Hello, I've just started doing some python programming with
htmllib.HTMLParser to spider a website of mine and grab all of the
images and download them to disk, as well as collecting reference
counts for my hyperlinks.  It works pretty well, except on a few web
pages that were generated with Word and most of these pages don't
contain images or anchor tags and I imagine the HTMLParser module
meant for XHTML documents will handle those just find once I get
around to playing with it.

My questions is, after having looked around on the web for examples,
I've noticed that most people seem to use sgmllib.SGMLParser instead.
I know that htmllib.HTMLParser is just a subclass of SGMLParser,
therefore I was wondering what the pros and cons are to using one or
the other.  Any recommendations?  Thanks in advance.

-- 
Sam
"Da Man"
s/nospam/son/ -- to email me.



More information about the Python-list mailing list