HTML data extraction?
Dave Kuhlman
dkuhlman at rexx.com
Mon Dec 22 13:29:49 EST 2003
I recently read an article by Jon Udell about extracting data from
Web pages as a poor person's Web services. So, I have a question:
Is there any Python support for finding and extracting information
from HTML documents.
I'd like something that would do things like the following:
- return the data which is inside a <b> tag which is inside a
<li> tag.
- return the data which is inside a <a> tag that has attribute
href="http://www.python.org".
- Etc.
It would be a sort of structured grep for HTML.
I've found the HTMLParser and htmllib modules in the Python
standard library, but I'm wondering if there is anything at a
higher level.
Web searches did not turn up anything interesting.
Thanks for help.
Dave
--
http://www.rexx.com/~dkuhlman
dkuhlman at rexx.com
More information about the Python-list
mailing list