Meta: EBNF notation (was Re: [Doc-SIG] Structuring: a summary; and an attempt at EBNF..)

Edward D. Loper edloper@gradient.cis.upenn.edu
Thu, 19 Apr 2001 04:15:30 EDT


> Why don't you simply use INDENT and DEDENT tokens, which may
> represent any arbitrary number of spaces as long as they match up?
> Don't forget: This is Python and anyone seriously interested in
> Python should be already familar with this concept from the Python
> Grammar file and will probably understand this at the first glance.

Because these assume that there is no single indent that corresponds
to multiple dedents.  Which is true in Python, but not necessarily in
the markup language we're talking about.  In particular, consider::

  - This is a list item.

      - This is a sublist item.

    This is another paragraph in the main list item.

According to python's rules for generating INDENT and DEDENT tokens,
the dedent before "this is another..." would be illegal because it
doesn't line up with anything.  But according to my EBNF (assuming
that I got it right), it comes out correctly::

    IND IND - this is a list item
    IND IND - this is a sublist item.
    DED DED - This is another paragraph in the main list item.
    DED DED

Also, I should apologize for being very fast and loose with notation.
I'll clean that up before I make anything formal (e.g., before putting
anything in a PEP).  There are indeed several variations on EBNF.  The
basic one I was using uses the kleene star (x*) to mean 0 or more
repetitions of x, and the kleene cross (x+) to mean 1 or more
repetitions of x; I think I may have also used x? to mean 0 or 1
x's.. Basically the productions I wrote should read roughly as regexps
(with the VERBOSE flag).

I agree that x[n] isn't the best choice of notation, especially given
that I think I may have used things like "[^ NL S]" to mean "any
character that's not a newline or a space..  Perhaps x<n>?  

One thing to note here is that the language I'm using is strictly more
powerful than EBNF.  The reason, as I said before, is because I have
crossing dependancies.  It would be possible to express the same
*string* language without crossing dependancies, but only if we allow
the first paragraph of a list item to be split across two different
nonterminals.

Also, incidentally, I used "(?! ..)", too, which is also strictly more
powerful than EBNFs (it's not context free; you can generate a^n b^n
c^n with it)... But I used it just as a matter of convenience --
everything I wrote with it could be re-written without it.

-Edward