[Tutor] BeautifulSoup - getting cells without new line characters

jonasmg at softhome.net jonasmg at softhome.net
Fri Mar 31 23:08:13 CEST 2006


Kent Johnson writes: 

> jonasmg at softhome.net wrote: 
> 
>> List of states:
>> http://en.wikipedia.org/wiki/U.S._state  
>> 
>> : soup = BeautifulSoup(html)
>> : # Get the second table (list of states).
>> : table = soup.first('table').findNext('table')
>> : print table  
>> 
>> ...
>> <tr>
>> <td>WY</td>
>> <td>Wyo.</td>
>> <td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>
>> <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, 
>> Wyoming">Cheyenne</a></td>
>> <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne, 
>> Wyoming">Cheyenne</a></td>
>> <td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image" title=""><img 
>> src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin 
>> g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30" 
>> longdesc="/wiki/Image:Flag_of_Wyoming.svg" /></a></td>
>> </tr>
>> </table>  
>> 
>> Of each row (tr), I want to get the cells (td): 1,3,4 
>> (postal,state,capital). But cells 3 and 4 have anchors. 
> 
> So dig into the cells and get the data from the anchor. 
> 
> cells = row('td')
> cells[0].string
> cells[2]('a').string
> cells[3]('a').string 
> 
> Kent 
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor

for row in table('tr'):
   cells = row('td')
   print cells[0] 

IndexError: list index out of range 


More information about the Tutor mailing list