[Tutor] ex-ftp
Mal Wanstall
m.wanstall at gmail.com
Tue Sep 22 07:57:15 CEST 2009
On Tue, Sep 22, 2009 at 2:39 PM, prasad rao <prasadaraon50 at gmail.com> wrote:
> hello friends
> I am trying to write a class to save a url.page.
> But it is not working.It is saving the html.page.But not getting
> images.I am unable to show the list (object.links).
> Please take a look at it and show me how to rectify it.
>
> import urllib2,ftplib,re
> class Collect:
> def __init__(self,parent):
> self.parent=parent
> self.links=[]
> self.ims=[]
> s=urllib2.urlopen(self.parent)
> data=s.read()
> self.data=data
> a=re.compile ('<[aA].*[\'"](.*)[\'"].*>'); b=re.compile('<src
> img[\'"](.+)[\'"].*')
> try:
> z=re.search(a,self.data).group(1)
> self.links.extend(z)
> except:pass
> try:
> y=re.search(b,self.data).group(1)
> self.ims.extend(y)
> except:pass
> return
>
> def save(self,data):
> d=open('C:/%s .html'%self.parent[10:15],'w')
> d.write(data)
> return
> def bring(self):
> ftp=ftplib.FTP(self.parent)
> ftp.login()
> for x in self.ims:
> data=ftp.retlines(x)
> d=open('C:/%s'%x,'w')
> d.write(data)
> return
>
> def show(self,z):
> for x in z:
> print x
> return
>
>
> c=Collect('http://www.asstr.org')
> c.save(c.data)
> c.bring()
> #c.show(c.ims)
> c.links
>
> Thanks in advance.
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>
Might be nice to mention to all who access these emails from work that
the site this script is scraping is not safe for work.
-Mal
More information about the Tutor
mailing list