extract news article from web

Fuzzyman fuzzyman at gmail.com
Thu Dec 23 10:06:47 EST 2004


If you have a reliably structured page, then you can write a custom
parser. As Steve points out - BeautifulSOup would be a very good place
to start.

This is the problem that RSS was designed to solve. Many newssites will
supply exactly the information you want as an RSS feed. You should then
use Universal Feed Parser to process the feed.

The module you need for fecthing the webpages (in case you didn't know)
is urllib2. There is a great article on fetching webpages in the
current issue of pyzine. See http://www.pyzine.com :-)
Regards,

Fuzzy
http://www.voidspace.org.uk/python/index.shtml




More information about the Python-list mailing list