need start point for getting html info from web

Mike Meyer mwm at mired.org
Sun Oct 30 21:36:49 EST 2005


nephish at xit.net writes:
> i have a small app that i am going to need to get information from a
> few tables on different websites. i have looked at urllib and httplib.
> the sites i need to get data from mostly have this data in tables. So
> that, i think would make it easier. Anyone suggest a good starting
> point for me to find out how to do this, or know of a link to a good
> how-to?

Don't have a link to a howto. But you're halfway there. urllib (and
urllib2) will get HTML text from the websites. Pulling data from it
sort of depends on the nature of the HTML. If it's well-structured
XHTML, you can use your favorite xml library. if it's well structured
HTML, you can try htmllib, but it's pretty primitive. If it's not
well-structured, you can use BeautifulSoup. I've used it to pull data
from tables. The problem with any of this is that your code really
depends on the structure - or lack thereof - of the HTML you're
scraping. If they change it, your code breaks.

          <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list