What to use for finding as many syntax errors as possible.

Peter J. Holzer hjp-python at hjp.at
Wed Oct 12 20:45:22 EDT 2022


On 2022-10-13 11:23:40 +1100, Chris Angelico wrote:
> On Thu, 13 Oct 2022 at 11:19, Peter J. Holzer <hjp-python at hjp.at> wrote:
> > On 2022-10-11 09:47:52 +1100, Chris Angelico wrote:
> > > On Tue, 11 Oct 2022 at 09:18, Cameron Simpson <cs at cskk.id.au> wrote:
> > > >
> > > Consider:
> > >
> > > if condition # no colon
> > >     code
> > > else:
> > >     code
> > >
> > > To actually "restart" parsing, you have to make a guess of some sort.
> >
> > Right. At least one of the papers on parsing I read over the last few
> > years (yeah, I really should try to find them again) argued that the
> > vast majority of syntax errors is either a missing token, a superfluous
> > token or a combination of the the two. So one strategy with good results
> > is to heuristically try to insert or delete single tokens and check
> > which results in the longest distance to the next error.
> >
> > Checking multiple possible fixes has its cost, especially since you have
> > to do that at every error. So you can argue that it is better for
> > productivity if you discover one error in 0.1 seconds than 10 errors in
> > 5 seconds.
> 
> Maybe; but what if you report 10 errors in 5 seconds, but 8 of them
> are spurious? You've reported two useful errors in a sea of noise.
> Even if it's the other way around (8 where you nailed it and correctly
> reported the error, 2 that are nonsense), is it actually helpful?

Humans are pattern-matching animals. It is quite possible that seeing a
bunch of related errors makes the fix more obvious than seeing them in
isolation.

No, I haven't done any studies on this. Yes, it is possible that all
those compiler writers who spent lots of work on error recovery over the
last 50 years (or longer) are delusional.


> > > > I grew up with C and Pascal compilers which would _happily_ produce many
> > > > complaints, usually accurate, and all manner of syntactic errors. They
> > > > didn't stop at the first syntax error.
> > >
> > > Yes, because they work with a much simpler grammar.
> >
> > I very much doubt that. Python doesn't have a particularly complicated
> > grammar, and C certainly doesn't have a particularly simple one.
> >
> > The argument that it's impossible in Python (unlike any other language),
> > because Python is oh so special doesn't hold water.
> >
> 
> Never said it's because Python is special; there are a LOT of
> languages that are at least as complicated.

And almost all of their compilers do try to recover from errors.

> But I do think that Pascal, especially, has a significantly simpler
> grammar than Python does.

Incidentally, Turbo Pascal was the one other example of a compiler which
*didn't* try to recover.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/python-list/attachments/20221013/1a76325c/attachment.sig>


More information about the Python-list mailing list