Splitting on a word
Bernhard Holzmayer
Holzmayer.Bernhard at deadspam.com
Thu Jul 14 06:53:43 EDT 2005
qwweeeit at yahoo.it wrote:
> Hi all,
> I am writing a script to visualize (and print)
> the web references hidden in the html files as:
> '<a href="web reference"> underlined reference</a>'
> Optimizing my code, I found that an essential step is:
> splitting on a word (in this case 'href').
>
> I am asking if there is some alternative (more pythonic...):
Sure. The htmllib module provides HTMLparser.
Here's an example, run it with your HTML file as argument
and you'll see a list of all href's in the document.
#------------------------------------------------
#!/usr/bin/python
import htmllib
def test():
import sys, formatter
file = sys.argv[1]
f = open(file, 'r')
data = f.read()
f.close()
f = formatter.NullFormatter()
p = htmllib.HTMLParser(f)
p.feed(data)
for a_link in p.anchorlist:
print a_link
p.close()
test()
#------------------------------------------------
I'm sure that this is far more Pythonic!
Bernhard
More information about the Python-list
mailing list