Named regexp variables, an extension proposal.

Sun May 14 14:24:08 EDT 2006

"Paddy" <paddy3118 at netscape.net> wrote in message
news:1147598275.868430.203300 at g10g2000cwb.googlegroups.com...
> It's difficult to switch to parsers for me even though examples like
> pyparsing seem readable, I do want to skip what I am not interested in
> rather than having to write a parser for everything. But converely,
> when something skipped does bite me - I want to be able to easily add
> it in.
>
> Are their any examples of this kind of working with parsers?
>

pyparsing offers several flavors of skipping over uninteresting text.  The
most obvious is scanString.  scanString is a generator function that scans
through the input text looking for pattern matches (multiple patterns can be
OR'ed together) - when a match is found, the matching tokens, start, and end
locations are yielded.  Here's a short example that ships with pyparsing:

from pyparsing import Word, alphas, alphanums, Literal, restOfLine,
OneOrMore, Empty

# simulate some C++ code
testData = """
#define MAX_LOCS=100
#define USERNAME = "floyd"
#define PASSWORD = "swordfish"

a = MAX_LOCS;
CORBA::initORB("xyzzy", USERNAME, PASSWORD );

"""

#################
print "Example of an extractor"
print "----------------------"

# simple grammar to match #define's
ident = Word(alphas, alphanums+"_")
macroDef = Literal("#define") + ident.setResultsName("name") + "=" +
restOfLine.setResultsName("value")
for t,s,e in macroDef.scanString( testData ):
    print t.name,":", t.value

# or a quick way to make a dictionary of the names and values
macros = dict([(t.name,t.value) for t,s,e in macroDef.scanString(testData)])
print "macros =", macros
print

--------------------
prints:
Example of an extractor
----------------------
MAX_LOCS : 100
USERNAME :  "floyd"
PASSWORD :  "swordfish"
macros = {'USERNAME': '"floyd"', 'PASSWORD': '"swordfish"', 'MAX_LOCS':
'100'}

Note that scanString worked only with the expressions we defined, and
ignored pretty much everything else.

scanString has a companion method, transformString.  transformString calls
scanString internally - the purpose is to apply any parse actions or
suppressions on the matched tokens, substitute them back in for the original
text, and then return the transformed string.  Here are two transformer
examples, one uses the macros dictionary we just created, and does simple
macro substitution; the other converts C++-namespaced references to
C-compatible global symbols (something we had to do in the early days of
CORBA):

#################
print "Examples of a transformer"
print "----------------------"

# convert C++ namespaces to mangled C-compatible names
scopedIdent = ident + OneOrMore( Literal("::").suppress() + ident )
scopedIdent.setParseAction(lambda s,l,t: "_".join(t))

print "(replace namespace-scoped names with C-compatible names)"
print scopedIdent.transformString( testData )

# or a crude pre-processor (use parse actions to replace matching text)
def substituteMacro(s,l,t):
    if t[0] in macros:
        return macros[t[0]]
ident.setParseAction( substituteMacro )
ident.ignore(macroDef)

print "(simulate #define pre-processor)"
print ident.transformString( testData )

--------------------------
prints:
Examples of a transformer
----------------------
(replace namespace-scoped names with C-compatible names)

#define MAX_LOCS=100
#define USERNAME = "floyd"
#define PASSWORD = "swordfish"

a = MAX_LOCS;
CORBA_initORB("xyzzy", USERNAME, PASSWORD );

(simulate #define pre-processor)

#define MAX_LOCS=100
#define USERNAME = "floyd"
#define PASSWORD = "swordfish"

a = 100;
CORBA::initORB("xyzzy", "floyd", "swordfish" );

I'd say it took me about 8 weeks to develop a complete Verilog parser using
pyparsing, so I can sympathize that you wouldn't want to write a complete
parser for it.  But the individual elements are pretty straightforward, and
can map to pyparsing expressions without much difficulty.

Lastly, pyparsing is not as fast as RE's.  But early performance problems
can often be improved through some judicious grammar tuning.  And for many
parsing applications, pyparsing is plenty fast enough.

Regards,
-- Paul