Agnostic fetching

Michael Torrie torriem at gmail.com
Sat Aug 2 18:59:03 EDT 2008


jorpheus wrote:
> OK, that sounds stupid. Anyway, I've been learning Python for some
> time now, and am currently having fun with the urllib and urllib2
> modules, but have run into a problem(?) - is there any way to fetch
> (urllib.retrieve) files from a server without knowing the filenames?
> For instance, there is smth like folder/spam.egg, folder/
> unpredictable.egg and so on. If not, perhaps some kind of glob to
> create a list of existing files? I'd really appreciate some help,
> since I'm really out of my (newb) depth here.

If you happen to have a URL that simply lists files, then what you have
to do is relatively simple.  Just fetch the html from the folder url,
then parse the html and look for the anchor tags.  You can then fetch
those anchor urls that interest you.  BeautifulSoup can help out with
this.  Should be able to list all anchor tags in an html string in just
one line of code.  Combine urllib2 and BeautifulSoup and you'll have a
winner.



More information about the Python-list mailing list