Named regexp variables, an extension proposal.

Paddy paddy3118 at netscape.net
Sun May 14 04:49:30 EDT 2006


John Machin wrote:
> On 13/05/2006 7:39 PM, Paddy wrote:
> [snip]
> > Extension; named RE variables, with arguments
> > ===================================
> > In this, all group definitions in the body of the variable definition
> > reference the literal contents of groups appearing after the variable
> > name, (but within the variable reference),  when the variable is
> > referenced
> >
> > So an RE variable definition like:
> >   defs = r'(?smx) (?P/GO/ go \s for \s \1 )'
> >
> > Used like:
> >   rgexp = defs + r"""
> >     (?P=GO (it) )
> >     \s+
> >     (?P=\GO (broke) )
> >   """
> > Would match the string:
> >   "go for it  go for broke"
> >
> > As would:
> >   defs2 = r'(?smx) (?P/GO/ go \s for \s (?P=subject) )'
> >   rgexp = defs2 + r"""
> >     (?P=GO (?P<subject> it) )
> >     \s+
> >     (?P=\GO (?P<subject> broke) )
> >   """
> >
> > The above would allow me to factor out sections of REs and define
> > named, re-ussable RE snippets.
> >
> >
> > Please comment :-)
>
>
> 1. Regex syntax is over-rich already.

First, thanks for the reply John.

Yep, regex syntax is rich, but one of the reasons I went ahead with my
post was that it might add a new way to organize regexps into more
managable chunks, rather ike functions do.

> 2. You may be better off with a parser for this application instead of
> using regexes.
unfortunately my experience counts against me going for parser
solutions rather than regxps. Although, being a Python user I always
think again before using a regexp and remember to think if their might
be a clearer string method solution to tasks; I am not comfotable with
parsers/parser generators.

The reason I used to  dismiss parsers this time is that I have only
ever seen parsers for complete languages. I don't want to write a
complete parser for Verilog, I want to take an easier 'good enough'
route that I have used with success, from my AWK days. (Don't laugh, my
exposure to AWK after years of C, was just as profound as more recent
speakers have blogged about  their fealings of release from Java after
exposure to new dynamic languages - all hail AWK, not completely put
out to stud :-)
I intend to write a large regexp that picks out the things that I want
from a verilog file, skipping the bits I am un-iterested in. With a
regular expression, if I don't write something to match, say, always
blocks, then, although if someone wrote ssignal definitions (which I am
interested in), in the task, then I would pick those up as well as
module level signal definitions, but that would be 'good enough' for my
app.
All the parser examples I see don't 'skip things',

- Hell, despite writing my own interpreted, recursive descent, language
many (many..), years ago in C; too much early lex &yacc'ing about left
me with a grudge!

> 3. "\\" is overloaded to the point of collapse already. Using it as an
> argument marker could make the universe implode.

Did I truly write '=\GO' ? Twice!
Sorry, the example should have used '=GO' to refer to RE variables. I
made, then copied the error.
Note: I also tried to cut down on extra syntax by re-using the syntax
for referring to named groups (Or I would have if my proof reading were
better).

> 4. You could always use Python to roll your own macro expansion gadget,
> like this:

Thanks for going to the trouble of writing the expander. I too had
thought of that, but that would lead to 'my little RE syntax' that
would be harder to maintain and others might reinvent the solution but
with their own mini macro syntax.

> 
> Cheers,
> John

- Paddy.




More information about the Python-list mailing list