Extract Information from Tables in html
Peter Pearson
ppearson at nowhere.invalid
Fri Sep 5 11:03:29 EDT 2008
On Fri, 5 Sep 2008 11:35:14 -0300, Walter Cruz <walter.php at gmail.com> wrote:
> On Fri, Sep 5, 2008 at 11:29 AM, Jackie Wang <jackie.python at gmail.com> wrote:
>> Here is a html code:
>>
>> <td valign="top" headers="col4">
>>
>> Premier Community Bank of Southwest Florida
>> <br />
>> Fort Myers, FL
>>
>> </td>
>>
>> My question is how I can extract the strings and get the results:
>> Premier Community Bank of Southwest Florida; Fort Myers, FL
>
> Use BeautifulSoup.
I agree, BeautifulSoup is wonderful. Here are snippets of
code that I recently used to locate (in each of many HTML
files) the table that contained a particular heading:
from BeautifulSoup import BeautifulSoup
import re
...
inlines = ifd.readlines()
soup = BeautifulSoup( " ".join( inlines ) )
x = soup.findAll( text = re.compile( "Technical Requirements - General" ) )
x = x[0].parent
while x.name != "table":
x = x.parent
tr_list = x.findAll( "tr", recursive = False )
print "Table has %d rows." % len( tr_list )
--
To email me, substitute nowhere->spamcop, invalid->net.
More information about the Python-list
mailing list