Named regexp variables, an extension proposal.

Paddy paddy3118 at netscape.net
Sat May 13 05:39:21 EDT 2006


Proposal: Named RE variables
======================

The problem I have is that I am writing a 'good-enough' verilog tag
extractor as a long regular expression (with the 'x' flag for
readability), and find myself both
 1) Repeating sections of the RE, and
 2) Wanting to add '(?P<some_clarifier>...) ' around sections
     because I know what the section does but don't really want
     the group.

If I could write:
 (?P/verilog_name/ [A-Za-z_][A-Za-z_0-9\$\.]* | \\\S+ )

...and have the RE parser extract the section of RE after the second
'/' and store it associated with its name that appears between the
first two '/'. The RE should NOT try and match against anything between
the outer '(' ')' pair at this point, just store.

Then the following code appearing later in the RE:
  (?P=verilog_name)

...should retrieve the RE snippet named and insert it into the RE
instead of the '(?P=...)' group before interpreting the RE 'as normal'

Instead of writing the following to search for event declarations:
  vlog_extract = r'''(?smx)
    # Verilog event definition extraction
    (?: event \s+ [A-Za-z_][A-Za-z_0-9\$\.]* \s* (?: , \s*
[A-Za-z_][A-Za-z_0-9\$\.]*)* )
  '''
I could write the following RE, which I think is clearer:
  vlog_extract = r'''(?smx)
    # Verilog identifier definition
    (?P/IDENT/ [A-Za-z_][A-Za-z_0-9\$\.]* (?!\.) )
    # Verilog event definition extraction
    (?: event \s+ (?P=IDENT) \s* (?: , \s* (?P=IDENT))* )
  '''

Extension; named RE variables, with arguments
===================================
In this, all group definitions in the body of the variable definition
reference the literal contents of groups appearing after the variable
name, (but within the variable reference),  when the variable is
referenced

So an RE variable definition like:
  defs = r'(?smx) (?P/GO/ go \s for \s \1 )'

Used like:
  rgexp = defs + r"""
    (?P=GO (it) )
    \s+
    (?P=\GO (broke) )
  """
Would match the string:
  "go for it  go for broke"

As would:
  defs2 = r'(?smx) (?P/GO/ go \s for \s (?P=subject) )'
  rgexp = defs2 + r"""
    (?P=GO (?P<subject> it) )
    \s+
    (?P=\GO (?P<subject> broke) )
  """

The above would allow me to factor out sections of REs and define
named, re-ussable RE snippets.


Please comment :-)

- Paddy.




More information about the Python-list mailing list