What to use for finding as many syntax errors as possible.

Chris Angelico rosuav at gmail.com
Wed Oct 12 20:23:40 EDT 2022


On Thu, 13 Oct 2022 at 11:19, Peter J. Holzer <hjp-python at hjp.at> wrote:
>
> On 2022-10-11 09:47:52 +1100, Chris Angelico wrote:
> > On Tue, 11 Oct 2022 at 09:18, Cameron Simpson <cs at cskk.id.au> wrote:
> > >
> > Consider:
> >
> > if condition # no colon
> >     code
> > else:
> >     code
> >
> > To actually "restart" parsing, you have to make a guess of some sort.
>
> Right. At least one of the papers on parsing I read over the last few
> years (yeah, I really should try to find them again) argued that the
> vast majority of syntax errors is either a missing token, a superfluous
> token or a combination of the the two. So one strategy with good results
> is to heuristically try to insert or delete single tokens and check
> which results in the longest distance to the next error.
>
> Checking multiple possible fixes has its cost, especially since you have
> to do that at every error. So you can argue that it is better for
> productivity if you discover one error in 0.1 seconds than 10 errors in
> 5 seconds.

Maybe; but what if you report 10 errors in 5 seconds, but 8 of them
are spurious? You've reported two useful errors in a sea of noise.
Even if it's the other way around (8 where you nailed it and correctly
reported the error, 2 that are nonsense), is it actually helpful? Bear
in mind that, if you can discover one syntax error in 0.1 seconds, you
can do that check *the moment the user types a key* in the editor
(which is more-or-less what happens with most syntax highlighting
editors - some have a small delay to avoid being too noisy with error
reporting, but same difference). Why report false errors when you can
report errors one by one and know that they're true?

> > > I grew up with C and Pascal compilers which would _happily_ produce many
> > > complaints, usually accurate, and all manner of syntactic errors. They
> > > didn't stop at the first syntax error.
> >
> > Yes, because they work with a much simpler grammar.
>
> I very much doubt that. Python doesn't have a particularly complicated
> grammar, and C certainly doesn't have a particularly simple one.
>
> The argument that it's impossible in Python (unlike any other language),
> because Python is oh so special doesn't hold water.
>

Never said it's because Python is special; there are a LOT of
languages that are at least as complicated. Try giving multiple useful
errors when there's a syntactic problem in SQL, for instance. But I do
think that Pascal, especially, has a significantly simpler grammar
than Python does.

ChrisA


More information about the Python-list mailing list