[Tutor] html links

Alan Gauld alan.gauld at btinternet.com
Tue May 15 09:37:40 CEST 2007


"max ." <dos.fool at gmail.com> wrote

> does anyone know of a tutorial for finding links in a web site with 
> python.
>
Beautifulsuop has been mentioned but its not part of standard python.

Her is an example of the standard library parser:

html = '''
<html><head><title>Test page</title></head>
<body>
<center>
<h1>Here is the first heading</h1>
</center>
<p>A short paragraph
<h1>A second heading</h1>
<p>A paragraph containing a
<a href="www.python.org">hyperlink to python</a>
</body></html>
'''

from HTMLParser import HTMLParser

class H1Parser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.h1_count = 0
        self.isHeading = False

    def handle_starttag(self,tag,attributes=None):
        if tag == 'h1':
            self.h1_count += 1
            self.isHeading = True

    def handle_endtag(self,tag):
        if tag == 'h1':
            self.isHeading = False

    def handle_data(self,data):
        if self.isHeading and self.h1_count == 2:
            print "Second Header contained: ", data

parser = H1Parser()
parser.feed(html)
parser.close()

> or creating files and asking ware to create a file.

I'm not sure what you mean here? Do you mean fetching a file
from a remote server? There is an ftp module if its from an ftp 
site...


-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld 




More information about the Tutor mailing list