SimplePrograms challenge

Rob Wolfe rw at smsnet.pl
Tue Jun 12 08:06:08 EDT 2007


Steve Howell wrote:
> Hi, I'm offering a challenge to extend the following
> page by one good example:
>
> http://wiki.python.org/moin/SimplePrograms

What about simple HTML parsing? As a matter of fact this is not
language concept, but shows the power of Python standard library.
Besides, that's very popular problem among newbies. This program
for example shows all the linked URLs in the HTML document:

<code>
from HTMLParser import HTMLParser

page = '''
<html><head><title>URLs</title></head>
<body>
<ul>
<li><a href="http://domain1/page1">some page1</a></li>
<li><a href="http://domain2/page2">some page2</a></li>
</ul>
</body></html>
'''

class URLLister(HTMLParser):
    def reset(self):
        HTMLParser.reset(self)
        self.urls = []

    def handle_starttag(self, tag, attrs):
        try:
            # get handler for tag and call it e.g. self.start_a
            getattr(self, "start_%s" % tag)(attrs)
        except AttributeError:
            pass

    def start_a(self, attrs):
        href = [v for k, v in attrs if k == "href"]
        if href:
            self.urls.extend(href)

parser = URLLister()
parser.feed(page)
parser.close()
for url in parser.urls: print url
</code>

--
Regards,
Rob




More information about the Python-list mailing list