Parsing

Gordon McMillan gmcm at hypernet.com
Tue May 4 09:15:47 EDT 1999


> I am using Aycock's package to handle some parsing but I am having
> trouble because the language I am parsing is highly context
> sensitive. I don't have any trouble dealing with the
> context-sensitivity in the so-called "context free grammar" part of
> the package (the parser) but in the scanner it is killing me.
> 
> Let's pretend I am parsing a tagged (but non-SGML) language where
> there is an element "URL". Within "URL" elements, the characters <
> and > are illegal: they must be escaped as \< and \>.
> 
> Elsewhere they are not. Here is the grammar I would *like* to write
> (roughly):
> 
> Element ::= <URL> urlcontent </URL>
> urlcontent = (([^<>\/:]* ("\<"|"\>"|":"|"/"|"\\"))*
> Element ::= <NOT-A-URL> anychar* </NOT-A-URL>
/snip/
> 
> I could handle it if I could switch scanners mid-stream (for URL
> elements) but Aycock's scanner finishes up before the parser even
> gets under way! 

You can use an "outer" and an "inner" scanner. The outer one 
recognizes vanilla stuff, and the existence of a (e.g.) urlcontent. 
The inner one picks apart the urlcontent, returns its list, and the 
outer just concatenates those results to his own. So you come into 
the parser with a merged set of tokens.


- Gordon




More information about the Python-list mailing list