SimplePrograms challenge
Rob Wolfe
rw at smsnet.pl
Tue Jun 12 17:37:59 EDT 2007
Steven Bethard <steven.bethard at gmail.com> writes:
> I'd hate to steer a potential new Python developer to a clumsier
"clumsier"???
Try to parse this with your program:
page2 = '''
<html><head><title>URLs</title></head>
<body>
<ul>
<li><a href="http://domain1/page1">some page1</a></li>
<li><a href="http://domain2/page2">some page2</a></li>
</body></html>
'''
> library when Python 2.5 includes ElementTree::
>
> import xml.etree.ElementTree as etree
>
> page = '''
> <html><head><title>URLs</title></head>
> <body>
> <ul>
> <li><a href="http://domain1/page1">some page1</a></li>
> <li><a href="http://domain2/page2">some page2</a></li>
> </ul>
> </body></html>
> '''
>
> tree = etree.fromstring(page)
> for a_node in tree.getiterator('a'):
> url = a_node.get('href')
> if url is not None:
> print url
It might be even one-liner:
print "\n".join((url.get('href', '') for url in tree.findall(".//a")))
But as far as HTML (not XML) is concerned this is not very realistic solution.
>
> I know that the wiki page is supposed to be Python 2.4 only, but I'd
> rather have no example than an outdated one.
This example is by no means "outdated".
--
Regards,
Rob
More information about the Python-list
mailing list