parsing tables with beautiful soup?

Duncan Booth duncan.booth at invalid.invalid
Thu Mar 22 05:58:19 EDT 2007


"cjl" <cjlesh at gmail.com> wrote:

> This works:
> 
> for row in soup.find("table",{"class": "class_name"}):
>      for cell in row:
>           print cell.contents[0]
> 
> Is there a better way to do this?
> 

It may work for the page you are testing against, but it wouldn't work if 
your page contained valid HTML. You are assuming that the TR elements are 
direct children of the TABLE, but HTML requires that the TR elements appear 
inside THEAD, TBODY or TFOOT elements, so if anyone ever corrects the html 
your code will break.

Something like this (untested) ought to work and be reasonably robust:

table = soup.find("table",{"class": "class_name"})
for row in table.findAll("tr"):
	for cell in row.findAll("td"):
		print cell.findAll(text=True)




More information about the Python-list mailing list