downloading from links within a webpage

Shiva shivaji_tn at yahoo.com
Tue Oct 14 10:42:21 EDT 2014


Hi,

Here is a small code that I wrote that downloads images from a webpage url
specified (you can limit to how many downloads you want). However, I am
looking at adding functionality and searching external links from this page
and downloading the same number of images from that page as well.(And
limiting the depth it can go to)

Any ideas?  (I am using Python 3.4 & I am a beginner)

import urllib.request
import re
url="http://www.abc.com"

pagehtml = urllib.request.urlopen(url)
myfile = pagehtml.read()
matches=re.findall(r'http://\S+jpg|jpeg',str(myfile))


for urltodownload in matches[0:50]:
  imagename=urltodownload[-12:]
  urllib.request.urlretrieve(urltodownload,imagename)

print('Done!')
 
Thanks,
Shiva




More information about the Python-list mailing list