beutifulsoup

Peter Pearson ppearson at nowhere.invalid
Wed Oct 29 18:43:58 EDT 2008


On Wed, 29 Oct 2008 09:45:31 -0700 (PDT), luca72 <lucaberto at libero.it> wrote:
> Hello
> I try to use beautifulsoup
> i have this:
> sito = urllib.urlopen('http://www.prova.com/')
> esamino = BeautifulSoup(sito)
> luca = esamino.findAll('tr', align='center')
>
> print luca[0]
>
[The following long string has been wrapped.]
>>><tr align="center"><th width="5%"><a onclick="t('Only|G|BoT|05','#1');" 
   href="#">#1</a></th><td width="10%">44.4MB</td>
   <td width="90%" align="left">
   <font color="orange"> Pc-prova.rar </font></td></tr>
>
> I need to get the following information:
> 1)Only|G|BoT|05
> 2)#1
> 3)44.4MB
> 4)Pc-prova.rar
> with: print luca[0].a.string    i get #1
> with print luca[0].td.string    i get 44.4MB
> can you explain me how to get the others two value

Like you, I struggle with BeautifulSoup; but perhaps this will help
while waiting for somebody smarter to join the thread:

>>> soup = BeautifulSoup.BeautifulSoup(
... """<tr align="center"><th width="5%">"""
... """<a onclick="t('Only|G|BoT|05','#1');" href="#">#1</a>"""
... """</th><td width="10%">44.4MB</td><td width="90%" align="left">"""
... """<font color="orange"> Pc-prova.rar </font></td></tr>""" )
>>> tr = soup.findAll( 'tr' )
>>> tr[0].findAll( text = True )
[u'#1', u'44.4MB', u' Pc-prova.rar ']
>>> c = tr[0].findChild( attrs={"onclick": True} )
>>> print c[ "onclick" ]
t('Only|G|BoT|05','#1');


-- 
To email me, substitute nowhere->spamcop, invalid->net.



More information about the Python-list mailing list