how to scrape url out of href

Kent Johnson kent at kentsjohnson.com
Mon Jan 2 08:59:38 EST 2006


homepricemaps at gmail.com wrote:
> mike's code worked like a charm.  i have one more question.  i have an
> href which looks like this:
> 
> <td class="all">
>     <a class="btn" name="D1" href="http://www.cnn.com">
>         </a>
> 
> i thought i would use this code to get the href out but it fails, gives
> me a keyerror:
> 
> for incident in row('td', {'class':'all'}):
> 		n = incident.findNextSibling('a', {'class': 'btn'})
> 		link = incident.findNextSibling['href'] + "','"
> 
> 
> any idea what i'm doing wrong here with the syntax?  thanks in advance
> 

ISTM that <a class="btn"> is a child of <td>, not a sibling, and 
findNextSibling is a method, not an indexable element. Try
   n = incident('a', {'class': 'btn'})
   link = n['href'] + "','"

Kent



More information about the Python-list mailing list