Beautiful Soup - close tags more promptly?

Jon Ribbens jon+usenet at unequivocal.eu
Mon Oct 24 11:34:45 EDT 2022


On 2022-10-24, Chris Angelico <rosuav at gmail.com> wrote:
> On Mon, 24 Oct 2022 at 23:22, Peter J. Holzer <hjp-python at hjp.at> wrote:
>> Yes, I got that. What I wanted to say was that this is indeed a bug in
>> html.parser and not an error (or sloppyness, as you called it) in the
>> input or ambiguity in the HTML standard.
>
> I described the HTML as "sloppy" for a number of reasons, but I was of
> the understanding that it's generally recommended to have the closing
> tags. Not that it matters much.

Some elements don't need close tags, or even open tags. Unless you're
using XHTML you don't need them and indeed for the case of void tags
(e.g. <br>, <img>) you must not include the close tags.

A minimal HTML file might look like this:

    <!DOCTYPE html>
    <html lang=en><meta charset=utf-8><title>Minimal HTML file</title>
    <main><h1>Minimal HTML file</h1>This is a minimal HTML file.</main>

which would be parsed into this:

    <!DOCTYPE html>
    <html lang="en">
      <head>
        <meta charset="utf-8">
        <title>Minimal HTML file</title>
      </head>
      <body>
        <main>
          <h1>Minimal HTML file</h1>
          This is a minimal HTML file.
        </main>
      </body>
    </html>

Adding in the omitted <head>, </head>, <body>, </body>, and </html>
would make no difference and there's no particular reason to recommend
doing so as far as I'm aware.


More information about the Python-list mailing list