scraping nested tables with BeautifulSoup

Kent Johnson kent at kentsjohnson.com
Tue Apr 4 12:09:50 EDT 2006


Gonzillaaa at gmail.com wrote:
> Hey Kent,
> 
> thanks for your reply. how did you exactly save the file in firefox? if
> I save the file locally I get the same error.

The Firefox version, among other things, turns all the funky <!FOO> and 
<!/FOO> tags into comments. Here is a way to do the same thing with BS:

import re
from urllib import urlopen
from BeautifulSoup import BeautifulSoup, BeautifulStoneSoup

# This tells BS to turn <!FOO> into <!-- FOO --> which allows it
# to do a better job parsing this data
fixExclRe = re.compile(r'<!(?!--)([^>]+)>')
BeautifulStoneSoup.PARSER_MASSAGE.append( (fixExclRe, r'<!-- \1 -->') )

data = urlopen('http://www.findaproperty.com/regi0018.html').read()
soup = BeautifulSoup(data)

priceGuide = soup('table', dict(bgcolor="e0f0f8", border="0", 
cellpadding="2", cellspacing="2", width="150"))[1]
print priceGuide


Kent



More information about the Python-list mailing list