how to find not the next sibling but the 2nd sibling or find sibling "a" OR sinbling "b"

localpricemaps at gmail.com localpricemaps at gmail.com
Wed Jan 18 14:19:59 EST 2006


i have some html which looks like this where i want to scrape out the
href stuff (the www.cnn.com part)

<div class="noFood">Cheese</div>
<div class="food">Blue</div>
<a class="btn" href = "http://www.cnn.com">


so i wrote this code which scrapes it perfectly:

for incident in row('div', {'class':'noFood'}):
			b = incident.findNextSibling('div', {'class': 'food'})
       			print b
			n = b.findNextSibling('a', {'class': 'btn'})
       			print n
			link = n['href'] + "','"

problem is that sometimes the 2nd tag , the <div class="food"> tag , is
sometimes called food, sometimes called drink.  so sometimes it looks
like this:

<div class="noFood">Cheese</div>
<div class="drink">Pepsi</div>
<a class="btn" href = "http://www.cnn.com">

how do i alter my script to take into  account the fact that i will
sometimes have food and sometimes have drink as the class name?  is
there a way to say "look for food or drink" or a way to say "look for
this incident and then find not the next sibling but the 2nd next
sibling" if that makes any sense?

thanks




More information about the Python-list mailing list