Named regexp variables, an extension proposal.

Sun May 14 05:17:55 EDT 2006

Paul McGuire wrote:
> "Paddy" <paddy3118 at netscape.net> wrote in message
> news:1147513160.977268.253690 at j33g2000cwa.googlegroups.com...
> > Proposal: Named RE variables
> > ======================

Hi Paul, please also refer to my reply to John.

>
> By contrast, the event declaration expression in the pyparsing Verilog
> parser is:
>
> identLead = alphas+"$_"
> identBody = alphanums+"$_"
> #~ identifier = Combine( Optional(".") +
> #~                       delimitedList( Word(identLead, identBody), ".",
> combine=True ) ).setName("baseIdent")
> # replace pyparsing composition with Regex - improves performance ~10% for
> this construct
> identifier = Regex(
> r"\.?["+identLead+"]["+identBody+"]*(\.["+identLead+"]["+identBody+"]*)*" ).
> setName("baseIdent")
>
> eventDecl = Group( "event" + delimitedList( identifier ) + semi )
>
I have had years of success by writing RE's to extract what I am
interested in, not react to what I'm not interested in, and maybe make
slight mods down the line as examples crop up that break the program. I
do rely on what examples I get to test my extractors, but I find
examples a lot easier to come by than the funds/time for a language
parser. Since I tend to stay in a job for a number of years, I know
that the method works, and gives quick results that rapidly become
dependable as I am their to catch any flak ;-).

It's difficult to switch to parsers for me even though examples like
pyparsing seem readable, I do want to skip what I am not interested in
rather than having to write a parser for everything. But converely,
when something skipped does bite me - I want to be able to easily add
it in.

Are their any examples of this kind of working with parsers?

>
> But why do you need an update to RE to compose snippets?  Especially
> snippets that you can only use in the same RE?  Just do string interp:
>
> > I could write the following RE, which I think is clearer:
> >   vlog_extract = r'''(?smx)
> >     # Verilog identifier definition
> >     (?P/IDENT/ [A-Za-z_][A-Za-z_0-9\$\.]* (?!\.) )
> >     # Verilog event definition extraction
> >     (?: event \s+ (?P=IDENT) \s* (?: , \s* (?P=IDENT))* )
> >   '''
> IDENT = "[A-Za-z_][A-Za-z_0-9\$\.]* (?!\.)"
> vlog_extract = r'''(?smx)
>   # Verilog event definition extraction
>   (?: event \s+ %(IDENT)s \s* (?: , \s* %(IDENT)s)* )
>   ''' % locals()
>
> Yuk, this is a mess - which '%' signs are part of RE and which are for
> string interp?  Maybe just plain old string concat is better:

Yeah, I too thought that the % thing was ugly when used on an RE.

>
> IDENT = "[A-Za-z_][A-Za-z_0-9\$\.]* (?!\.)"
> vlog_extract = r'''(?smx)
>   # Verilog event definition extraction
>   (?: event \s+ ''' + IDENT + ''' \s* (?: , \s* ''' + IDENT + ''')* )'''

... And the string concats broke up the visual flow of my multi-line
RE.

>
> By the way, your IDENT is not totally accurate - it does not permit a
> leading ".", and it does permit leading digits in identifier elements after
> the first ".".  So ".goForIt" would not be matched as a valid identifier
> when it should, and "go.4it" would be matched as valid when it shouldn't (at
> least as far as I read the Verilog grammar).

Thanks for the info on IDENT. I am not working with the grammer spec in
front of me, and I know I will have to revisit my RE. you've saved me
some time!
>
> (Pyparsing (http://sourceforge.net/projects/pyparsing/) is open source under
> the MIT license.  The Verilog grammar is not distributed with pyparsing, and
> is only available free for noncommercial use.)
> 
> -- Paul

- Paddy.