Splitting on a word

Bernhard Holzmayer Holzmayer.Bernhard at deadspam.com
Thu Jul 14 06:53:43 EDT 2005


qwweeeit at yahoo.it wrote:

> Hi all,
> I am writing a script to visualize (and print)
> the web references hidden in the html files as:
> '<a href="web reference"> underlined reference</a>'
> Optimizing my code, I found that an essential step is:
> splitting on a word (in this case 'href').
> 
> I am asking if there is some alternative (more pythonic...):

Sure. The htmllib module provides HTMLparser.
Here's an example, run it with your HTML file as argument
and you'll see a list of all href's in the document.

#------------------------------------------------
#!/usr/bin/python
import htmllib

def test():
        import sys, formatter

        file = sys.argv[1]
        f = open(file, 'r')
        data = f.read()
        f.close()

        f = formatter.NullFormatter()
        p = htmllib.HTMLParser(f)
        p.feed(data)

        for a_link in p.anchorlist:
                print a_link

        p.close()

test()
#------------------------------------------------

I'm sure that this is far more Pythonic!

Bernhard



More information about the Python-list mailing list