web crawling.

Fuzzyman fuzzyman at gmail.com
Thu Jan 19 04:40:58 EST 2006


Use BeautifulSoup to get all the image tags out of the html.

You'll need to join the urls of the images to the url of the page
(urlparse.urljoin off the top of my head). If you look at BeautifulSoup
you will see how to get the 'src' reference of each image tag.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml




More information about the Python-list mailing list