[Tutor] BeautifulSoup - getting cells without new line characters
jonasmg at softhome.net
jonasmg at softhome.net
Sat Apr 1 14:22:23 CEST 2006
Kent Johnson writes:
> jonasmg at softhome.net wrote:
>> Kent Johnson writes:
>>
>>
>>>jonasmg at softhome.net wrote:
>>>
>>>
>>>>List of states:
>>>>http://en.wikipedia.org/wiki/U.S._state
>>>>
>>>>: soup = BeautifulSoup(html)
>>>>: # Get the second table (list of states).
>>>>: table = soup.first('table').findNext('table')
>>>>: print table
>>>>
>>>>...
>>>><tr>
>>>><td>WY</td>
>>>><td>Wyo.</td>
>>>><td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>
>>>><td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
>>>>Wyoming">Cheyenne</a></td>
>>>><td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
>>>>Wyoming">Cheyenne</a></td>
>>>><td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image" title=""><img
>>>>src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin
>>>>g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30"
>>>>longdesc="/wiki/Image:Flag_of_Wyoming.svg" /></a></td>
>>>></tr>
>>>></table>
>>>>
>>>>Of each row (tr), I want to get the cells (td): 1,3,4
>>>>(postal,state,capital). But cells 3 and 4 have anchors.
>>>
>>>So dig into the cells and get the data from the anchor.
>>>
>>>cells = row('td')
>>>cells[0].string
>>>cells[2]('a').string
>>>cells[3]('a').string
>>>
>>>Kent
>>>
>>>_______________________________________________
>>>Tutor maillist - Tutor at python.org
>>>http://mail.python.org/mailman/listinfo/tutor
>>
>>
>> for row in table('tr'):
>> cells = row('td')
>> print cells[0]
>>
>> IndexError: list index out of range
>
> It works for me:
>
>
> In [1]: from BeautifulSoup import BeautifulSoup as bs
>
> In [2]: soup=bs('''<tr>
> ...: <td>WY</td>
> ...: <td>Wyo.</td>
> ...: <td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>
> ...: <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
> ...: Wyoming">Cheyenne</a></td>
> ...: <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
> ...: Wyoming">Cheyenne</a></td>
> ...: <td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image"
> title=""><img
> ...:
> src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/bc/Flag_of_Wyomin
> ...: g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30"
> ...: longdesc="/wiki/Image:Flag_of_Wyoming.svg" /></a></td>
> ...: </tr>
> ...: </table> '''
> ...:
> ...:
> ...:
> ...: )
>
> In [18]: rows=soup('tr')
>
> In [19]: rows
> Out[19]:
> [<tr>
> <td>WY</td>
> <td>Wyo.</td>
> <td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>
> <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
> Wyoming">Cheyenne</a></td>
> <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
> Wyoming">Cheyenne</a></td>
> <td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image"
> title=""><img src="http://upload.
>
> g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30"
> longdesc="/wiki/Image:Flag_
> </tr>]
>
> In [21]: cells=rows[0]('td')
>
> In [22]: cells
> Out[22]:
> [<td>WY</td>,
> <td>Wyo.</td>,
> <td><a href="/wiki/Wyoming" title="Wyoming">Wyoming</a></td>,
> <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
> Wyoming">Cheyenne</a></td>,
> <td><a href="/wiki/Cheyenne%2C_Wyoming" title="Cheyenne,
> Wyoming">Cheyenne</a></td>,
> <td><a href="/wiki/Image:Flag_of_Wyoming.svg" class="image"
> title=""><img src="http://upload
> n
> g.svg/45px-Flag_of_Wyoming.svg.png" width="45" alt="" height="30"
> longdesc="/wiki/Image:Flag_
>
> In [23]: cells[0].string
> Out[23]: 'WY'
>
> In [24]: cells[2].a.string
> Out[24]: 'Wyoming'
>
> In [25]: cells[3].a.string
>
>
> Kent
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
Yes, ok. But so, it is only possible get data from a row (rows[0])
cells=rows[0]('td')
And I want get data from all rows. I have trying with several 'for' setences
but i can not.
More information about the Tutor
mailing list