Named regexp variables, an extension proposal.

John Machin sjmachin at lexicon.net
Sat May 13 08:30:53 EDT 2006


On 13/05/2006 7:39 PM, Paddy wrote:
[snip]
> Extension; named RE variables, with arguments
> ===================================
> In this, all group definitions in the body of the variable definition
> reference the literal contents of groups appearing after the variable
> name, (but within the variable reference),  when the variable is
> referenced
> 
> So an RE variable definition like:
>   defs = r'(?smx) (?P/GO/ go \s for \s \1 )'
> 
> Used like:
>   rgexp = defs + r"""
>     (?P=GO (it) )
>     \s+
>     (?P=\GO (broke) )
>   """
> Would match the string:
>   "go for it  go for broke"
> 
> As would:
>   defs2 = r'(?smx) (?P/GO/ go \s for \s (?P=subject) )'
>   rgexp = defs2 + r"""
>     (?P=GO (?P<subject> it) )
>     \s+
>     (?P=\GO (?P<subject> broke) )
>   """
> 
> The above would allow me to factor out sections of REs and define
> named, re-ussable RE snippets.
> 
> 
> Please comment :-)


1. Regex syntax is over-rich already.
2. You may be better off with a parser for this application instead of 
using regexes.
3. "\\" is overloaded to the point of collapse already. Using it as an 
argument marker could make the universe implode.
4. You could always use Python to roll your own macro expansion gadget, 
like this:

C:\junk>type paddy_rx.py
import re
flags = r'(?smx)'
GO = r'go \s for \s &1 &2'
WS = r'\s+'

ARGMARK = "&"

# Can the comments about the style of
# this code; I've just translated it from
# a now-dead language with max 6 chars in variable names :-)
def macsub(tmplt, *infils):
    wstr = tmplt
    ostr = ""
    while wstr:
       lpos = wstr.find(ARGMARK)
       if lpos < 0:
          return ostr + wstr
       ostr = ostr + wstr[:lpos]
       nch = wstr[lpos+1:lpos+2]
       if "1" <= nch <= "9":
          x = ord(nch)-ord("1")
          if x < len(infils):
             ostr = ostr + infils[x]
       elif nch == ARGMARK: # double & (or whatever)
          ostr = ostr + ARGMARK
       else:
          ostr = ostr + ARGMARK + nch
       wstr = wstr[lpos+2:]
    return ostr

regexp = " ".join([
     flags,
     macsub(GO, 'it,\s', 'Paddy'),
     WS,
     macsub(GO, 'broke'),
     ])
print regexp
text = "go for it, Paddy  go for broke"
m = re.match(regexp, text)
print len(text), m.end()

C:\junk>paddy_rx.py
(?smx) go \s for \s it,\s Paddy \s+ go \s for \s broke
30 30



Cheers,
John



More information about the Python-list mailing list