Beautiful Soup - close tags more promptly?

Chris Angelico rosuav at gmail.com
Mon Oct 24 12:09:33 EDT 2022


On Tue, 25 Oct 2022 at 02:45, Jon Ribbens via Python-list
<python-list at python.org> wrote:
>
> On 2022-10-24, Chris Angelico <rosuav at gmail.com> wrote:
> > On Mon, 24 Oct 2022 at 23:22, Peter J. Holzer <hjp-python at hjp.at> wrote:
> >> Yes, I got that. What I wanted to say was that this is indeed a bug in
> >> html.parser and not an error (or sloppyness, as you called it) in the
> >> input or ambiguity in the HTML standard.
> >
> > I described the HTML as "sloppy" for a number of reasons, but I was of
> > the understanding that it's generally recommended to have the closing
> > tags. Not that it matters much.
>
> Some elements don't need close tags, or even open tags. Unless you're
> using XHTML you don't need them and indeed for the case of void tags
> (e.g. <br>, <img>) you must not include the close tags.

Yep, I'm aware of void tags, but I'm talking about the container tags
- in this case, <li> and <p> - which, in a lot of older HTML pages,
are treated as "separator" tags. Consider this content:

<HTML>
Hello, world!
<P>
Paragraph 2
<P>
Hey look, a third paragraph!
</HTML>

Stick a doctype onto that and it should be valid HTML5, but as it is,
it's the exact sort of thing that was quite common in the 90s. (I'm
not sure when lowercase tags became more popular, but in any case (pun
intended), that won't affect validity.)

The <p> tag is not a void tag, but according to the spec, it's legal
to omit the </p> if the element is followed directly by another <p>
element (or any of a specific set of others), or if there is no
further content.

> Adding in the omitted <head>, </head>, <body>, </body>, and </html>
> would make no difference and there's no particular reason to recommend
> doing so as far as I'm aware.

And yet most people do it. Why? Are you saying that it's better to
omit them all?

More importantly: Would you omit all the </p> closing tags you can, or
would you include them?

ChrisA


More information about the Python-list mailing list