How use XML parsing tools on this one specific URL?

Fredrik Lundh fredrik at pythonware.com
Mon Mar 5 03:55:01 EST 2007


skip at pobox.com wrote:
>
>    Chris> http://moneycentral.msn.com/companyreport?Symbol=BBBY
>
>    Chris> I can't validate it and xml.minidom.dom.parseString won't work on
>    Chris> it.
>
>    Chris> If this was just some teenager's web site I'd move on.  Is there
>    Chris> any hope avoiding regular expression hacks to extract the data
>    Chris> from this page?
>
> Tidy it perhaps or use BeautifulSoup?  ElementTree can use tidy if it's
> available.

ElementTree can also use BeautifulSoup:

    http://effbot.org/zone/element-soup.htm

as noted on that page, tidy is a bit too picky for this kind of use; it's better suited
for "normalizing" HTML that you're producing yourself than for parsing arbitrary
HTML.

</F> 






More information about the Python-list mailing list