Parsing HTML

John Nagle nagle at animats.com
Sat Feb 24 01:04:45 EST 2007


    BeautifulSoup does parse HTML well, but there are a few issues:

    1.  It's rather slow; it can take seconds of CPU time to parse
some larger web pages.

    2.  There's no error reporting.  It tries to do the right thing,
but when it doesn't, you have no idea what went wrong.

BeautifulSoup would be a good test case for the PyPy crowd to
work on.  It really needs the speedup.

				John Nagle

sofeng wrote:
> On Feb 8, 11:43 am, "metaperl" <metap... at gmail.com> wrote:
>>On Feb 8, 2:38 pm, "mtuller" <mitul... at gmail.com> wrote:
>>>I am trying to parse a webpage and extract information.
>>BeautifulSoup is a great Python module for this purpose:



More information about the Python-list mailing list