Beautiful Soup - close tags more promptly?

Peter J. Holzer hjp-python at hjp.at
Mon Oct 24 18:33:11 EDT 2022


On 2022-10-25 06:56:58 +1100, Chris Angelico wrote:
> On Tue, 25 Oct 2022 at 04:22, Peter J. Holzer <hjp-python at hjp.at> wrote:
> > There may be several reasons:
> >
> > * Historically, some browsers differed in which end tags were actually
> >   optional. Since (AFAIK) no mainstream browser ever implemented a real
> >   SGML parser (they were always "tag soup" parsers with lots of ad-hoc
> >   rules) this sometimes even changed within the same browser depending
> >   on context (e.g. a simple table might work but nested tables woudn't).
> >   So people started to use end-tags defensively.
> > * XHTML was for some time popular and it doesn't have any optional tags.
> >   So people got into the habit of always using end tags and writing
> >   empty tags as <XXX />.
> > * Aesthetics: Always writing the end tags is more consistent and may
> >   look more balanced.
> > * Cargo-cult: People saw other people do that and copied the habit
> >   without thinking about it.
> >
> >
> > > Are you saying that it's better to omit them all?
> >
> > If you want to conserve keystrokes :-)
> >
> > I think it doesn't matter. Both are valid.
> >
> > > More importantly: Would you omit all the </p> closing tags you can, or
> > > would you include them?
> >
> > I usually write them.
> 
> Interesting. So which of the above reasons is yours?

Mostly the third one at this point I think. The first one has gone away
for me with HTML5. The second one still lingers at the back of
my brain, but I've gotten rid of the habit of writing <img .../>, so I'm
recevering ;-). But I still like my code to be nice and tidy, and
whether my sense of tidyness was influenced by XML or not, if the end
tags are missing it looks off, somehow.

(That said, I do sometimes leave them off to reduce visual clutter.)


> One thing I find quite interesting, though, is the way that browsers
> *differ* in the face of bad nesting of tags. Recently I was struggling
> to figure out a problem with an HTML form, and eventually found that
> there was a spurious <form> tag way up higher in the page. Forms don't
> nest, so that's invalid, but different browsers had slightly different
> ways of showing it.

Yeah, mismatched form tags can have weird effects. I don't remember the
details but I scratched my head over that one more than once.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/python-list/attachments/20221025/9fe3860c/attachment.sig>


More information about the Python-list mailing list