Two RE proposals

Fri Jul 26 21:02:22 EDT 2002

    >> How about
    >> 
    >> word = r"\w*"
    >> punct = r"[,.;?]"
    >> wordpunct = re.compile(r"%(word)s%(punct)s" % locals())
    >> 
    >> which you can do today?  (I'd also argue that a word would be "\w+".)

    David> I considered something like this, but it's too verbose, not to
    David> mention confusing - what's inherently wrong with my idea? 

Nothing I suppose, except someone has to write the code to implement it,
while the proposal I put forth exists today.  As for verbosity, "!<word>"
saves precisely one character over "%(word)s".  I'll grant you the "%
locals()" adds a few more characters, but it's a constant factor.

I don't understand how a basic facility of the language that has been around
for God knows how long could be more confusing than writing regular
expressions. <wink>

    David> I am not familiar with the idiom of "locals()".)

>From the online help:

    Help on built-in function locals:

    locals(...)
        locals() -> dictionary

        Return the dictionary containing the current scope's local variables.

    >> The * doesn't (and shouldn't) operate over grouping parens.  You're
    >> asking it to supply you with a variable number of groups, which it
    >> can't do.

    David> You're right - it doesn't operate over grouping parens, but why
    David> _shouldn't_ it? IIRC, _some_ regex pacakges could do this...

How about using non-grouping parens:

    >>> pat = re.compile(r"((?:a|b)*)")
    >>> pat.match("ababaaaabccdabab")
    <_sre.SRE_Match object at 0x40348ea0>
    >>> _.group(1)
    'ababaaaab'

Skip