Recursive functions

Steve Holden sholden at holdenweb.com
Sat Oct 20 10:20:32 EDT 2001


"Nachiket" <nachiket_gole at hotmail.com> wrote in message
news:602c058e.0110191750.6bcd8367 at posting.google.com...
> I have this code to create a map of webpages.
> Can I put it in a recursive function that calls itself a specific
> number of times instead of writing multiple loops.
> I tried to put it in a method called say handlelinks but when I called
> it it gives me a attribute error.  the code is as follows

Rule 1: When asking for help, ALWAYS include the exact error message.

Rule 2: Try to reduce the code to something small that still demonstrates
the error.

Anyway, be that as it may ... here we go. Note: I haven't subjects the
program to any kind of global analysis, just looked through it for stuff
that obviously needs fixing.

> #!/usr/bin/env python^M

Watch out if you try to run this program under Unix: the ^M will mean it
can't find your interpreter binary.

> import sys
> from os.path import walk
> from htmllib import HTMLParser
> from formatter import NullFormatter
> import urllib
>
> #---read in URL as first argument on command line
> try:
>   inf = urllib.urlopen(sys.argv[1])
> except IndexError:
>   inf = sys.stdin
> #---extracts all the links from html pages
>
> counter = 3
> class myparser(HTMLParser):
>   def __init__(self):
>     HTMLParser.__init__(self,NullFormatter())
>     self.linktext = {}
>     self.lasturl = None
>   def start_a(self,attr):
>     for key,val in attr:
>      if key == 'href': self.lasturl = val
>        #self.linktext.append(val)
>     HTMLParser.save_bgn(self)
>   def end_a(self):
>     self.linktext[self.lasturl] = HTMLParser.save_end(self)
>
> p = myparser()
> p.feed(inf.read())
> depth = 2
> #-----------------------------------------------------------
> print inf
> for link in p.linktext.keys():
>   if(link[:8] == 'https://'):
>      print 'Secure Website found...cannot open'
>   elif(link[:7] == 'http://'):
>     print link[:20],'-->',(p.linktext[link[:25]])
>     temp = urllib.urlopen(link)
>     depth-1

What's this supposed to do? What it actually does is take the value of
depth, subtract one from it and .. nothing. Particularly, it does not modify
the value bound to depth. Perhaps you mean

    depth = depth - 1

or

    depth -= 1

>     if depth == 0:
>       print 'exit'

Also, note that printing "exit" doesn't actually alter the flow of control.
Perhaps you actually want to do something that exits here, as well?

>     p.feed(temp.read())
>     for link in p.linktext.keys():
>       if(link[:8] == 'https://'):
>         print 'Secure Website found...cannot open'
>       elif(link[:7] == 'http://'):
>        print '','|',link[:25],'-->',p.linktext[link]
>        temp1 = urllib.urlopen(link)
>        depth-1
>        if depth == 0:
>          print 'exit'

Comments above apply here too.

>        p.feed(temp1.read())
>    else:return 'unknown format'
>        for link in p.linktext.keys():
>         if(link[:8] == 'https://'):
>          print 'Secure Website found...cannot open'
>         elif(link[:7] == 'http://'):
>          print ' ','|','+',link[:25],'-->',p.linktext[link]
>          temp1 = urllib.urlopen(link)
>          depth-1
>          if depth == 0:
>           print 'exit'

And here, too.

>          p.feed(temp1.read())
>          for link in p.linktext.keys():
>           if(link[:8] == 'https://'):
>            print 'Secure Website found...cannot open'
>           elif(link[:7] == 'http://'):
>            print '   ','|','-',link[:25],'-->',p.linktext[link]
>            temp1 = urllib.urlopen(link)
>            depth-1
>            if depth == 0:
>             print 'exit'
>            p.feed(temp1.read())
>      elif(link[:8] == 'https://'):
>        print 'Secure Website'
>   elif(link[:7] == 'https:/'):
>     print 'Secure Website found...cannot open'
>   else:return 'unknow

We'll ignore the partial program, for now ...

Take a look, if you have it, at the "webchecker" utility in the "tools"
directory of your Python distribution.

regards
 Steve
--
http://www.holdenweb.com/








More information about the Python-list mailing list