What to use for finding as many syntax errors as possible.

Mon Oct 10 23:11:33 EDT 2022

Cameron, or OP if you prefer,

I think by now you have seen a suggestion that languages make choices and
highly structured ones can be easier to "recover" from errors and try to
continue than some with way more complex possibilities that look rather
unstructured.

What is the error in code like this?

A,b,c,d = 1,2,

Or is it an error at all?

Many languages have no concept of doing anything like the above and some
tolerate a trailing comma and some set anything not found to some form of
NULL or uninitialized and some ...

If you look at human language, some are fairly simple and some are way too
organized. But in a way it can make sense. Languages with gender will often
ask you to change the spelling and often how you pronounce things not only
based on whether a noun is male/female or even neuter but also insist you
change the form of verbs or adjectives and so on that in effect give
multiple signals that all have to line up to make a valid and understandable
sentence. Heck, in conversations, people can often leave out parts of  a
sentence such as whether you are talking about "I" or "you" or "she" or "we"
because the rest of the words in the sentence redundantly force only one
choice to be possible. 

So some such annoying grammars (in my opinion) are error
detection/correction codes in disguise. In days before microphones and
speakers, it was common to not hear people well, like on a stage a hundred
feet away with other ambient noises. Missing a word or two might still allow
you to get the point as other parts of the sentence did such redundancies.
Many languages have similar strictures letting you know multiple times if
something is singular or plural. And I think another reason was what I call
stranger detection. People who learn some vocabulary might still not speak
correctly and be identifiable as strangers, as in spies.

Do we need this in the modern age? Who knows! But it makes me prefer some
languages over others albeit other reasons may ...

With the internet today, we are used to expecting error correction to come
for free. Do you really need one of every 8 bits to be a parity bit, which
only catches may half of the errors, when the internals of your computer are
relatively error free and even the outside is protected by things like
various protocols used in making and examining packets and demanding some be
sent again if some checksum does not match? Tons of checking is built in so
at your level you rarely think about it. If you get a message, it usually is
either 99.9999% accurate, or you do not have it shown to you at all. I am
not talking about SPAM but about errors of transmission.

So my analogies are that if you want a very highly structured language that
can recover somewhat from errors, Python may not be it.

And over the years as features are added or modified, the structure tends to
get more complex. And R is not alone. Many surviving languages continue to
evolve and borrow from each other and any program that you run today that
could partially recover and produce pages of possible errors, may blow up
when new features are introduced.

And with UNICODE, the number of possible "errors" in what is placed in code
for languages like Julia that allow them in most places ...

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com at python.org> On
Behalf Of Cameron Simpson
Sent: Monday, October 10, 2022 6:17 PM
To: python-list at python.org
Subject: Re: What to use for finding as many syntax errors as possible.

On 11Oct2022 08:02, Chris Angelico <rosuav at gmail.com> wrote:
>There's a huge difference between non-fatal errors and syntactic 
>errors. The OP wants the parser to magically skip over a fundamental 
>syntactic error and still parse everything else correctly. That's never 
>going to work perfectly, and the OP is surprised at this.

The OP is not surprised by this, and explicitly expressed awareness that
resuming a parse had potential for "misparsing" further code.

I remain of the opinion that one could resume a parse at the next unindented
line and get reasonable results a lot of the time.

In fact, I expect that one could resume tokenising at almost any line which
didn't seem to be inside a string and often get reasonable results.

I grew up with C and Pascal compilers which would _happily_ produce many
complaints, usually accurate, and all manner of syntactic errors. They
didn't stop at the first syntax error.

All you need in principle is a parser which goes "report syntax error here,
continue assuming <some state>". For Python that might mean "pretend a
missing final colon" or "close open brackets" etc, depending on the context.
If you make conservative implied corrections you can get a reasonable
continued parse, enough to find further syntax errors.

I remember the Pascal compiler in particular had a really good "you missed a
semicolon _back there_" mode which was almost always correct, a nice boon
when correcting mistakes.

Cheers,
Cameron Simpson <cs at cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list