Beautiful Soup iterator question....

Steve Holden steve at holdenweb.com
Fri Apr 20 15:05:38 EDT 2007


cjl wrote:
> P:
> 
> I am screen-scraping a table. The table has an unknown number of rows,
> but each row has exactly 8 cells.  I would like to extract the data
> from the cells, but the first three cells in each row have their data
> nested inside other tags.
> 
> So I have the following code:
> 
> for row in table.findAll("tr"):
>     for cell in row.findAll("td"):
>         print cell.contents[0]
> 
> This code prints out all the data, but of course the first three cells
> still contain their unwanted tags.
> 
> I would like to do something like this:
> 
> for cell1, cell2, cell3, cell4, cell5, cell6, cell7, cell8 in
> row.findAll("td"):
> 
> Then treat each cell differently.
> 
> I can't figure this out. Can anyone point me in the right direction?
> 
did you try something like (untested)

cell1, cell2, cell3, cell4, cell5, \
		cell6, cell7, cell8 = row.findAll("td")

No need for the "for" if you want to handle each cell differently, you 
won;t be iterating over htem . And, as you saw, it doesn't work unless 
row.findAll(...) returns a sequence of eight-item containers.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb     http://del.icio.us/steve.holden
Recent Ramblings       http://holdenweb.blogspot.com




More information about the Python-list mailing list