[Tutor] ex-ftp

Mal Wanstall m.wanstall at gmail.com
Tue Sep 22 07:57:15 CEST 2009


On Tue, Sep 22, 2009 at 2:39 PM, prasad rao <prasadaraon50 at gmail.com> wrote:
> hello  friends
>   I am trying to write a class to save a url.page.
> But it is not working.It is saving the html.page.But not getting
> images.I am unable to show the list (object.links).
> Please take a look at it and show me how to rectify it.
>
> import urllib2,ftplib,re
> class Collect:
>     def __init__(self,parent):
>         self.parent=parent
>         self.links=[]
>         self.ims=[]
>         s=urllib2.urlopen(self.parent)
>         data=s.read()
>         self.data=data
>         a=re.compile ('<[aA].*[\'"](.*)[\'"].*>'); b=re.compile('<src
> img[\'"](.+)[\'"].*')
>         try:
>          z=re.search(a,self.data).group(1)
>          self.links.extend(z)
>         except:pass
>         try:
>          y=re.search(b,self.data).group(1)
>          self.ims.extend(y)
>         except:pass
>         return
>
>     def save(self,data):
>             d=open('C:/%s .html'%self.parent[10:15],'w')
>             d.write(data)
>             return
>     def bring(self):
>         ftp=ftplib.FTP(self.parent)
>         ftp.login()
>         for x in self.ims:
>             data=ftp.retlines(x)
>             d=open('C:/%s'%x,'w')
>             d.write(data)
>             return
>
>     def show(self,z):
>         for x in z:
>             print x
>         return
>
>
> c=Collect('http://www.asstr.org')
> c.save(c.data)
> c.bring()
> #c.show(c.ims)
> c.links
>
> Thanks in advance.
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>

Might be nice to mention to all who access these emails from work that
the site this script is scraping is not safe for work.
-Mal


More information about the Tutor mailing list