Extract Information from Tables in html
Stefan Behnel
stefan_ml at behnel.de
Fri Sep 5 12:05:20 EDT 2008
Hi,
Jackie Wang wrote:
> Here is a html code:
>
> <td valign="top" headers="col4">
>
> Premier Community Bank of Southwest Florida
> <br />
> Fort Myers, FL
>
> </td>
>
> My question is how I can extract the strings and get the results:
> Premier Community Bank of Southwest Florida; Fort Myers, FL
Use lxml.html. Something like this should do what you want:
>>> from lxml import html
>>> tree = html.parse("http://server.org/thefile.html")
>>> all_tds = tree.findall("//td")
>>> for td in all_tds:
... print( td.xpath("normalize-space()") )
Tweak as you see fit, tree iteration is at your service in case you need more.
http://codespeak.net/lxml/
Stefan
More information about the Python-list
mailing list