Parsing HTML
John Nagle
nagle at animats.com
Sat Feb 24 01:04:45 EST 2007
BeautifulSoup does parse HTML well, but there are a few issues:
1. It's rather slow; it can take seconds of CPU time to parse
some larger web pages.
2. There's no error reporting. It tries to do the right thing,
but when it doesn't, you have no idea what went wrong.
BeautifulSoup would be a good test case for the PyPy crowd to
work on. It really needs the speedup.
John Nagle
sofeng wrote:
> On Feb 8, 11:43 am, "metaperl" <metap... at gmail.com> wrote:
>>On Feb 8, 2:38 pm, "mtuller" <mitul... at gmail.com> wrote:
>>>I am trying to parse a webpage and extract information.
>>BeautifulSoup is a great Python module for this purpose:
More information about the Python-list
mailing list