Beautiful Soup - close tags more promptly?

Jon Ribbens jon+usenet at unequivocal.eu
Mon Oct 24 13:01:00 EDT 2022


On 2022-10-24, Chris Angelico <rosuav at gmail.com> wrote:
> On Tue, 25 Oct 2022 at 02:45, Jon Ribbens via Python-list
><python-list at python.org> wrote:
>>
>> On 2022-10-24, Chris Angelico <rosuav at gmail.com> wrote:
>> > On Mon, 24 Oct 2022 at 23:22, Peter J. Holzer <hjp-python at hjp.at> wrote:
>> >> Yes, I got that. What I wanted to say was that this is indeed a bug in
>> >> html.parser and not an error (or sloppyness, as you called it) in the
>> >> input or ambiguity in the HTML standard.
>> >
>> > I described the HTML as "sloppy" for a number of reasons, but I was of
>> > the understanding that it's generally recommended to have the closing
>> > tags. Not that it matters much.
>>
>> Some elements don't need close tags, or even open tags. Unless you're
>> using XHTML you don't need them and indeed for the case of void tags
>> (e.g. <br>, <img>) you must not include the close tags.
>
> Yep, I'm aware of void tags, but I'm talking about the container tags
> - in this case, <li> and <p> - which, in a lot of older HTML pages,
> are treated as "separator" tags.

Yes, hence why I went on to talk about container tags.

> Consider this content:
>
><HTML>
> Hello, world!
><P>
> Paragraph 2
><P>
> Hey look, a third paragraph!
></HTML>
>
> Stick a doctype onto that and it should be valid HTML5,

Nope, it's missing a <title>.

>> Adding in the omitted <head>, </head>, <body>, </body>, and </html>
>> would make no difference and there's no particular reason to recommend
>> doing so as far as I'm aware.
>
> And yet most people do it. Why?

They agree with Tim Peters that "Explicit is better than implicit",
I suppose? ;-)

> Are you saying that it's better to omit them all?

No, I'm saying it's neither option is necessarily better than the other.

> More importantly: Would you omit all the </p> closing tags you can, or
> would you include them?

It would depend on how much content was inside them I guess.
Something like:

  <ol>
    <li>First item
    <li>Second item
    <li>Third item
  </ol>

is very easy to understand, but if each item was many lines long then it
may be less confusing to explicitly close - not least for indentation
purposes.


More information about the Python-list mailing list