Agree "faster" is likely, but don't understand the portable/distributable
point: a module coded _in_ standard Python is quite portable! The
advantage to doing it that way is it gives people a chance to experiment
with the interface, before freezing it into the standard distribution.
> ... I can live with integer group names.
> [various improvements: groupnames attribute; optional varargs to
> 'compile' to initialize groupnames; modify 'groups' method to allow
> strings (names) in addition to integer indices]
> This ends up looking like your solution, but the relationship
> between 'regs', groups(), and 'groupnames' is explicit. This
> is useful because it increases the number of "fruitful
Well, there's nothing to stop you from writing a portable module, in std
Python, that does exactly all that today. If you did that & distributed
the module, and people liked it a lot, Guido might get interested in
hacking a C version <grin>.
> Improvement 2:
> Add syntax to regular expressions so that groups can be named
> in place, yielding the group dictionary. (This is a *big*
> advantage over perl.)
> For example:
> re = '[^0-9]*\(<number>[0-9]+\)[ \t]+\(<label>[A-Za-z_-.]+\)'
> decode = regex.compile( re)
> n, l = decode.groups( 'number', 'label')
> I like this idea, because then I can build complicated regular
> expressions in substrings, and then catenate them together
> into the final regular expression before compiling. It also
> completely eliminates group-counting, and it provides a visual
> indication of which groups are just for grouping, and which
> are for substring extraction.
Agree that _is_ nice. But again, it's something you _can_ do today, in
your own Python module (in your compile method, you "just" need to
analyze the pattern string before invoking regex.compile, stripping out
the '<name>' portions and saving away the derived name->index dict;
there's really no need to touch the current regex implementation, except
in that it's likely you'll wind up using more sets of parens than
regexmodule is currently compiled to handle).
> But what python really needs are LALR(1) parser objects, don't you
Hard to say! I confess I came to UNIX(tm) late in life, & never did
grasp the fascination with regexps. I find them awfully cryptic &
clumsy as soon as they go beyond the trivial. E.g., here's one from
python-mode.el, to match a Python line that opens a code block:
I can't even read that anymore! Least not without a lot of tedious
A std parsing approach might be better after all (how many more desperate
net msgs will we read asking how to capture the concept of nesting
brackets via regexps <0.9 grin>?). But not sure: the only truly
_pleasant_ pattern-matching language I've used is SNOBOL4, & even it was
clumsy for dealing with left recursion.
Suspect we agree that regexps aren't the right way to go for complex
pattern-matching tasks. On the other hand, I do think they're fine for
simple tasks, so maybe keeping them clusmy to use is doing most users a
favor <0.9 grin>.
> I'm afraid the trouble with this one [tracy's '<name>' extension] is
> that the syntax of Python regular expressions is defined by the GNU
> Emacs regular expression package.
Ya, but it's not an essential extension -- the <name> constructs are
syntactic sugar that could be stripped out before the Emacs package is
invoked. Not saying you _should_, just saying that it's not hard to do.
> ... Is it really that hard to count occurrences of \(?
Well, it _is_ error-prone: I remember when quoted strings were introduced
into Fortran, and hearing "is it really that hard to count the number of
characters in a Hollerith?" (hint: the answer is "no" <grin>).
I think what it _does_ do is impose an unnatural implementation layer
between the way we think of the problem & the way we need to code the
solution. On the other hand, in those cases where regexps get so fancy
that the need for counting parens goes above 3 (truly my _comfortable_
mental limit!), regexps probably aren't the right tool for the job anyway
> Would you folks settle for a recursive descent parser generator (like
> the one used to build the Python parser)? That one I know how to
I'd like to see someone suggest a specific interface, & code it up in
Python, so we could get a feel for how it works in practice.
In the meantime, I believe everyone agrees that a regex method supporting
varargs "integer group names" would be a valuable extension -- right?
mostly-just-thinking-out-loud-ly y'rs - tim
Tim Peters firstname.lastname@example.org
not speaking for Kendall Square Research Corp