Beautiful Soup iterator question....
Steve Holden
steve at holdenweb.com
Fri Apr 20 15:05:38 EDT 2007
cjl wrote:
> P:
>
> I am screen-scraping a table. The table has an unknown number of rows,
> but each row has exactly 8 cells. I would like to extract the data
> from the cells, but the first three cells in each row have their data
> nested inside other tags.
>
> So I have the following code:
>
> for row in table.findAll("tr"):
> for cell in row.findAll("td"):
> print cell.contents[0]
>
> This code prints out all the data, but of course the first three cells
> still contain their unwanted tags.
>
> I would like to do something like this:
>
> for cell1, cell2, cell3, cell4, cell5, cell6, cell7, cell8 in
> row.findAll("td"):
>
> Then treat each cell differently.
>
> I can't figure this out. Can anyone point me in the right direction?
>
did you try something like (untested)
cell1, cell2, cell3, cell4, cell5, \
cell6, cell7, cell8 = row.findAll("td")
No need for the "for" if you want to handle each cell differently, you
won;t be iterating over htem . And, as you saw, it doesn't work unless
row.findAll(...) returns a sequence of eight-item containers.
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
Recent Ramblings http://holdenweb.blogspot.com
More information about the Python-list
mailing list