Get directory from http web site

Sat Aug 6 16:45:55 EDT 2005

rock69 wrote:
> Hi all :)
> 
> I was wondering if there's some neat and easy way to get the entire
> contents of a directory at a specific web url address.
> 
> I have the following link:
> 
> http://www.infomedia.it/immagini/riviste/covers/cp
> 
> and as you can see it's just a list containing all the files (images)
> that I need. Is it possible to retrieve this list (not the physical
> files) and have it stored in a variable of type list or something?

BeautifulSoup and urllib do this easily:

 >>> from BeautifulSoup import BeautifulSoup
 >>> import urllib
 >>> data = urllib.urlopen('http://www.infomedia.it/immagini/riviste/covers/cp/').read()
 >>> soup = BeautifulSoup(data)
 >>> anchors = soup.fetch('a')
 >>> len(anchors)
164
 >>> for a in anchors[:10]:
 ...   print a['href'], a.string
 ...
?N=D Name
?M=A Last modified
?S=A Size
?D=A Description
/immagini/riviste/covers/ Parent Directory
cp100.jpg cp100.jpg
cp100sm.jpg cp100sm.jpg
cp101.jpg cp101.jpg
cp101sm.jpg cp101sm.jpg
cp102.jpg cp102.jpg

http://www.crummy.com/software/BeautifulSoup/

Kent