SimplePrograms challenge

Tue Jun 12 17:37:59 EDT 2007

Steven Bethard <steven.bethard at gmail.com> writes:

> I'd hate to steer a potential new Python developer to a clumsier

"clumsier"???
Try to parse this with your program:

page2 = '''
     <html><head><title>URLs</title></head>
     <body>
     <ul>
     <li><a href="http://domain1/page1">some page1</a></li>
     <li><a href="http://domain2/page2">some page2</a></li>
     </body></html>
     '''

> library when Python 2.5 includes ElementTree::
>
>     import xml.etree.ElementTree as etree
>
>     page = '''
>     <html><head><title>URLs</title></head>
>     <body>
>     <ul>
>     <li><a href="http://domain1/page1">some page1</a></li>
>     <li><a href="http://domain2/page2">some page2</a></li>
>     </ul>
>     </body></html>
>     '''
>
>     tree = etree.fromstring(page)
>     for a_node in tree.getiterator('a'):
>         url = a_node.get('href')
>         if url is not None:
>             print url

It might be even one-liner:
print "\n".join((url.get('href', '') for url in tree.findall(".//a")))

But as far as HTML (not XML) is concerned this is not very realistic solution.

>
> I know that the wiki page is supposed to be Python 2.4 only, but I'd
> rather have no example than an outdated one.

This example is by no means "outdated".

-- 
Regards,
Rob