[Tutor] html links
Alan Gauld
alan.gauld at btinternet.com
Tue May 15 09:37:40 CEST 2007
"max ." <dos.fool at gmail.com> wrote
> does anyone know of a tutorial for finding links in a web site with
> python.
>
Beautifulsuop has been mentioned but its not part of standard python.
Her is an example of the standard library parser:
html = '''
<html><head><title>Test page</title></head>
<body>
<center>
<h1>Here is the first heading</h1>
</center>
<p>A short paragraph
<h1>A second heading</h1>
<p>A paragraph containing a
<a href="www.python.org">hyperlink to python</a>
</body></html>
'''
from HTMLParser import HTMLParser
class H1Parser(HTMLParser):
def __init__(self):
HTMLParser.__init__(self)
self.h1_count = 0
self.isHeading = False
def handle_starttag(self,tag,attributes=None):
if tag == 'h1':
self.h1_count += 1
self.isHeading = True
def handle_endtag(self,tag):
if tag == 'h1':
self.isHeading = False
def handle_data(self,data):
if self.isHeading and self.h1_count == 2:
print "Second Header contained: ", data
parser = H1Parser()
parser.feed(html)
parser.close()
> or creating files and asking ware to create a file.
I'm not sure what you mean here? Do you mean fetching a file
from a remote server? There is an ftp module if its from an ftp
site...
--
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld
More information about the Tutor
mailing list