regexps: testing and creating MatchObjects in one fell swoop

Sat Sep 9 07:09:41 EDT 2000

dan wrote:
> This isn't a huge pain in the simple case, but it quickly becomes
> annoying when I want to do the equivalent of
>
>   if (/(\d+)\s+(\d+)/) {
>     ($num1, $num2) = ($1, $2);
>   } elsif (/(\w+)\s+(\w+)/) {
>     ($word1, $word2) = ($1, $2):
>   } # etc.
>
> as the two-part test in Python doesn't lend itself easily to a long
> if/elif/elif chain.

I'm tempted to mention the "replace nested conditionals
with guard clauses" refactoring rule, but I'll leave that
for another day...

> I've "solved" the problem locally by using the following helper
> function:
>
>   # research (regexp, string) is the same as regexp.search (string),
>   # but saves off the match results into 'rematch', so we can test for
>   # a regexp in an if statement and use the results immediately.
>   rematch = None
>   def research (regexp, string):
>       global rematch
>       rematch = regexp.search (string)
>       return (rematch != None)
>
> So that I can write:
>
>   if research (r"(\d+)\s+(\d+)", line):
>       (num1, num2) = rematch.groups()
>   elif research (r"(\w+)\s+(\w+)", line):
>       (word1, word2) = rematch.groups()
>   # etc.
>
> I suppose I can even inject research into the re module, and inject a
> similar method into regular expression objects. etc, to make it nicer.
>
> Is there a cleaner, or more approved, way, to accomplish this task?
> If not, does it make any sense to have a re.last_match object that
> automatically contains the last match, allowing, for example:
>
>   if re.search (r"(\d+)\s+(\d+)", line):
>       (num1, num2) = re.last_match.groups()
>
> Or is that too side-effecty and non-Pythonic?

Won't fly -- what if two threads are using the same regular
expression?  (or in your rematch example, what if two threads
are using regular expressions...)

:::

There's actually a slightly experimental feature in SRE that
can be useful here: combine your expressions into one big
expression, and use the new "lastgroup" attribute to figure
out which one that matched:

    >>> import re
    >>> p = re.compile("(?P<digits>\d+)|(?P<text>\w+)")
    >>> m = p.search("123 456")
    >>> print m.lastgroup, m.groups()
    digits ('123', None)

(however, keeping track of subgroups can be a major PITA
with this approach...)

</F>

<!-- daily news from the python universe:
http://www.pythonware.com/daily/index.htm
-->

Sent via Deja.com http://www.deja.com/
Before you buy.