Beautiful Soup - close tags more promptly?

Chris Angelico rosuav at gmail.com
Tue Oct 25 14:03:25 EDT 2022


On Wed, 26 Oct 2022 at 04:59, Tim Delaney <timothy.c.delaney at gmail.com> wrote:
>
> On Mon, 24 Oct 2022 at 19:03, Chris Angelico <rosuav at gmail.com> wrote:
>>
>>
>> Ah, cool. Thanks. I'm not entirely sure of the various advantages and
>> disadvantages of the different parsers; is there a tabulation
>> anywhere, or at least a list of recommendations on choosing a suitable
>> parser?
>
>
> Coming to this a bit late, but from my experience with BeautifulSoup and HTML produced by other people ...
>
> lxml is easily the fastest, but also the least forgiving.
> html.parer is middling on performance, but as you've seen sometimes makes mistakes.
> html5lib is the slowest, but is most forgiving of malformed input and edge cases.
>
> I use html5lib - it's fast enough for what I do, and the most likely to return results matching what the author saw when they maybe tried it in a single web browser.

Cool cool. It sounds like html5lib should really be the recommended
parser for HTML, unless performance or dependency reduction is
important enough to change your plans. (But only for HTML. For XML,
lxml would still be the right choice.)

ChrisA


More information about the Python-list mailing list