how to find not the next sibling but the 2nd sibling or find sibling "a" OR sinbling "b"
Kent Johnson
kent at kentsjohnson.com
Thu Jan 19 06:07:23 EST 2006
localpricemaps at gmail.com wrote:
> i have some html which looks like this where i want to scrape out the
> href stuff (the www.cnn.com part)
>
> <div class="noFood">Cheese</div>
> <div class="food">Blue</div>
> <a class="btn" href = "http://www.cnn.com">
>
>
> so i wrote this code which scrapes it perfectly:
>
> for incident in row('div', {'class':'noFood'}):
> b = incident.findNextSibling('div', {'class': 'food'})
> print b
> n = b.findNextSibling('a', {'class': 'btn'})
> print n
> link = n['href'] + "','"
>
> problem is that sometimes the 2nd tag , the <div class="food"> tag , is
> sometimes called food, sometimes called drink.
Apparently you are using Beautiful Soup. The value in the attribute
dictionary can be a callable; try this:
def isFoodOrDrink(attr):
return attr in ['food', 'drink']
b = incident.findNextSibling('div', {'class': isFoodOrDrink})
Alternately you could omit the class spec and check for it in code.
Kent
More information about the Python-list
mailing list