Named regexp variables, an extension proposal.
John Machin
sjmachin at lexicon.net
Sat May 13 08:30:53 EDT 2006
On 13/05/2006 7:39 PM, Paddy wrote:
[snip]
> Extension; named RE variables, with arguments
> ===================================
> In this, all group definitions in the body of the variable definition
> reference the literal contents of groups appearing after the variable
> name, (but within the variable reference), when the variable is
> referenced
>
> So an RE variable definition like:
> defs = r'(?smx) (?P/GO/ go \s for \s \1 )'
>
> Used like:
> rgexp = defs + r"""
> (?P=GO (it) )
> \s+
> (?P=\GO (broke) )
> """
> Would match the string:
> "go for it go for broke"
>
> As would:
> defs2 = r'(?smx) (?P/GO/ go \s for \s (?P=subject) )'
> rgexp = defs2 + r"""
> (?P=GO (?P<subject> it) )
> \s+
> (?P=\GO (?P<subject> broke) )
> """
>
> The above would allow me to factor out sections of REs and define
> named, re-ussable RE snippets.
>
>
> Please comment :-)
1. Regex syntax is over-rich already.
2. You may be better off with a parser for this application instead of
using regexes.
3. "\\" is overloaded to the point of collapse already. Using it as an
argument marker could make the universe implode.
4. You could always use Python to roll your own macro expansion gadget,
like this:
C:\junk>type paddy_rx.py
import re
flags = r'(?smx)'
GO = r'go \s for \s &1 &2'
WS = r'\s+'
ARGMARK = "&"
# Can the comments about the style of
# this code; I've just translated it from
# a now-dead language with max 6 chars in variable names :-)
def macsub(tmplt, *infils):
wstr = tmplt
ostr = ""
while wstr:
lpos = wstr.find(ARGMARK)
if lpos < 0:
return ostr + wstr
ostr = ostr + wstr[:lpos]
nch = wstr[lpos+1:lpos+2]
if "1" <= nch <= "9":
x = ord(nch)-ord("1")
if x < len(infils):
ostr = ostr + infils[x]
elif nch == ARGMARK: # double & (or whatever)
ostr = ostr + ARGMARK
else:
ostr = ostr + ARGMARK + nch
wstr = wstr[lpos+2:]
return ostr
regexp = " ".join([
flags,
macsub(GO, 'it,\s', 'Paddy'),
WS,
macsub(GO, 'broke'),
])
print regexp
text = "go for it, Paddy go for broke"
m = re.match(regexp, text)
print len(text), m.end()
C:\junk>paddy_rx.py
(?smx) go \s for \s it,\s Paddy \s+ go \s for \s broke
30 30
Cheers,
John
More information about the Python-list
mailing list