regular expressions questions

Derek Thomson derek at ooc.com.au
Sun Mar 26 19:13:29 EST 2000


"Andrew M. Kuchling" wrote:
> 
> "Darrell" <darrell at dorb.com> writes:
> > [Derek Thomson]
> > > [ And now, at last, Perl 5.6 allows regexes to refer to other regexes. So
> > >now we
> > > can uses regexes to parse balanced expressions like matching parenthesis.
> 
> Actually, this was an experimental feature in Perl5.005.  Search
> the perlre documentation for the (?{ code }) construct.
> 
> > >No doubt this will appear in Python's re module before long. ]
> 
> This is highly unlikely, because last time around we couldn't come up
> with a way to do this that wasn't impressively ugly. See the thread
> starting with this July 1998 article:
> <URL:http://x28.deja.com/getdoc.xp?AN=372884721>
> 
> The general tenor of opinion seemed to be that, if you want to parse
> expressions, use a real parser generator, and we certainly have enough
> of those for Python.

Really? I couldn't find anything that was even half as good as Perl's
Parse::RecDescent, after doing some searching from python.org. Links?

> Adding a feature to regexes for this produces
> difficult-to-read code -- you need a pretty intimate knowledge of
> exactly how the computer tries matches and then backtracks -- and
> makes the matching engine more complicated.  So this feature is almost
> certainly not going to be added.  (Elegant patches to add a new
> feature, of course, can often reverse theoretical objections.)

There is a large class of parsing applications for which writing an entire
grammar is overkill (or too much CS background is needed for the user. cp4e?),
but in which being able to match balanced subexpressions is necessary and
useful. We'll just have to stick to Perl for those, I guess.

I'm sure a decent syntax could be devised with a little thought. After all, the
named subexpression idea is very good, and a great improvement over Perl's
numbering system ie $1, $2 etc.

Regards,
Derek



More information about the Python-list mailing list