Parsing

Paul Prescod paul at prescod.net
Mon May 3 20:54:27 EDT 1999


I am using Aycock's package to handle some parsing but I am having trouble
because the language I am parsing is highly context sensitive. I don't
have any trouble dealing with the context-sensitivity in the so-called
"context free grammar" part of the package (the parser) but in the scanner
it is killing me.

Let's pretend I am parsing a tagged (but non-SGML) language where there is
an element "URL". Within "URL" elements, the characters < and > are
illegal: they must be escaped as \< and \>.

Elsewhere they are not. Here is the grammar I would *like* to write
(roughly):

Element ::= <URL> urlcontent </URL>
urlcontent = (([^<>\/:]* ("\<"|"\>"|":"|"/"|"\\"))*
Element ::= <NOT-A-URL> anychar* </NOT-A-URL>

Of course this is a made-up syntax because I don't think you can put
regular expressions in Aycock's BNF. I've used tools that do allow this so
I'm not sure how to handle it. This is also a made-up (simplified) example
so demonstrating how I can do it all in the scanner is probably not
helpful.

I could handle it if I could switch scanners mid-stream (for URL elements)
but Aycock's scanner finishes up before the parser even gets under way!
Should I scan and then parse (at a high level) and then rescan and reparse
the URLs? Is there a package that allows me to mix the lexical and
syntactic levels more?

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

Diplomatic term: "We had a frank exchange of views."
Translation: Negotiations stopped just short of shouting and
             table-banging. (Brill's Content, Apr. 1999)




More information about the Python-list mailing list