What to use for finding as many syntax errors as possible.

Chris Angelico rosuav at gmail.com
Tue Oct 11 07:39:07 EDT 2022


On Tue, 11 Oct 2022 at 18:12, <avi.e.gross at gmail.com> wrote:
>
> Thanks for a rather detailed explanation of some of what we have been
> discussing, Chris. The overall outline is about what I assumed was there but
> some of the details were, to put it politely, fuzzy.
>
> I see resemblances to something like how a web page is loaded and operated.
> I mean very different but at some level not so much.
>
> I mean a typical web page is read in as HTML with various keyword regions
> expected such as <BODY> ... </BODY> or <DIV ...> ... </DIV> with things
> often cleanly nested in others. The browser makes nodes galore in some kind
> of tree format with an assortment of objects whose attributes or methods
> represent aspects of what it sees. The resulting treelike structure has
> names like DOM.

Yes. The basic idea of "tokenize, parse, compile" can be used for
pretty much any language - even English, although its grammar is a bit
more convoluted than most programming languages, with many weird
backward compatibility features! I'll parse your last sentence above:

LETTERS The
SPACE
LETTERS resulting
SPACE
... you get the idea
LETTERS like
SPACE
LETTERS DOM
FULLSTOP # or call this token PERIOD if you're American

Now, we can group those tokens into meaningful sets.

Sentence(type=Statement,
    subject=Noun(name="structure", addenda=[
        Article(type=The),
        Adjective(name="treelike"),
    ]),
    verb=Verb(type=Being, name="has", addenda=[]),
    object=Noun(name="name", plural=True, addenda=[
        Adjective(phrase=Phrase(verb=Verb(name="like"), object=Noun(name="DOM"),
    ]),
)

Grammar nerds will probably dispute some of the awful shorthanding I
did here, but I didn't want to devise thousands of AST nodes just for
this :)

> To a certain approximation, this tree starts a certain way but is regularly
> being manipulated (or perhaps a copy is) as it regularly is looked at to see
> how to display it on the screen at the moment based on the current tree
> contents and another set of rules in Cascading Style Sheets.

Yep; the DOM tree is initialized from the HTML (usually - it's
possible to start a fresh tree with no HTML) and then can be
manipulated afterwards.

> These are not at all the same thing but share a certain set of ideas and
> methods and can be very powerful as things interact.

Oh absolutely. That's why there are languages designed to help you
define other languages.

> In effect the errors in the web situation have such analogies too as in what
> happens if a region of HTML is not well-formed or uses a keyword not
> recognized.

Aaaaand they're horribly horribly messy, due to a few decades of
sloppy HTML programmers and the desire to still display the page even
if things are messed up :) But, again, there's a huge difference
between syntactic errors (like omitting a matching angle bracket) and
semantic errors (a keyword not known, like using <spam> when you
should have used <span>). In the latter case, you can still build a
DOM tree, but you have an unknown element; in the former case, you
have to guess at what the author meant, just to get anything going at
all.

> There was a guy around a few years ago who suggested he would create a
> system where you could create a series of some kind of configuration files
> for ANY language and his system would them compile or run programs for each
> and every such language? Was that on this forum? What ever happened to him?

That was indeed on this forum, and I have no idea what happened to
him. Maybe he realised that all he'd invented was the Unix shebang?

ChrisA


More information about the Python-list mailing list