Parsing
Gordon McMillan
gmcm at hypernet.com
Tue May 4 09:15:47 EDT 1999
> I am using Aycock's package to handle some parsing but I am having
> trouble because the language I am parsing is highly context
> sensitive. I don't have any trouble dealing with the
> context-sensitivity in the so-called "context free grammar" part of
> the package (the parser) but in the scanner it is killing me.
>
> Let's pretend I am parsing a tagged (but non-SGML) language where
> there is an element "URL". Within "URL" elements, the characters <
> and > are illegal: they must be escaped as \< and \>.
>
> Elsewhere they are not. Here is the grammar I would *like* to write
> (roughly):
>
> Element ::= <URL> urlcontent </URL>
> urlcontent = (([^<>\/:]* ("\<"|"\>"|":"|"/"|"\\"))*
> Element ::= <NOT-A-URL> anychar* </NOT-A-URL>
/snip/
>
> I could handle it if I could switch scanners mid-stream (for URL
> elements) but Aycock's scanner finishes up before the parser even
> gets under way!
You can use an "outer" and an "inner" scanner. The outer one
recognizes vanilla stuff, and the existence of a (e.g.) urlcontent.
The inner one picks apart the urlcontent, returns its list, and the
outer just concatenates those results to his own. So you come into
the parser with a merged set of tokens.
- Gordon
More information about the Python-list
mailing list