regexps: testing and creating MatchObjects in one fell swoop
effbot at pythonware.com
effbot at pythonware.com
Sat Sep 9 07:09:41 EDT 2000
dan wrote:
> This isn't a huge pain in the simple case, but it quickly becomes
> annoying when I want to do the equivalent of
>
> if (/(\d+)\s+(\d+)/) {
> ($num1, $num2) = ($1, $2);
> } elsif (/(\w+)\s+(\w+)/) {
> ($word1, $word2) = ($1, $2):
> } # etc.
>
> as the two-part test in Python doesn't lend itself easily to a long
> if/elif/elif chain.
I'm tempted to mention the "replace nested conditionals
with guard clauses" refactoring rule, but I'll leave that
for another day...
> I've "solved" the problem locally by using the following helper
> function:
>
> # research (regexp, string) is the same as regexp.search (string),
> # but saves off the match results into 'rematch', so we can test for
> # a regexp in an if statement and use the results immediately.
> rematch = None
> def research (regexp, string):
> global rematch
> rematch = regexp.search (string)
> return (rematch != None)
>
> So that I can write:
>
> if research (r"(\d+)\s+(\d+)", line):
> (num1, num2) = rematch.groups()
> elif research (r"(\w+)\s+(\w+)", line):
> (word1, word2) = rematch.groups()
> # etc.
>
> I suppose I can even inject research into the re module, and inject a
> similar method into regular expression objects. etc, to make it nicer.
>
> Is there a cleaner, or more approved, way, to accomplish this task?
> If not, does it make any sense to have a re.last_match object that
> automatically contains the last match, allowing, for example:
>
> if re.search (r"(\d+)\s+(\d+)", line):
> (num1, num2) = re.last_match.groups()
>
> Or is that too side-effecty and non-Pythonic?
Won't fly -- what if two threads are using the same regular
expression? (or in your rematch example, what if two threads
are using regular expressions...)
:::
There's actually a slightly experimental feature in SRE that
can be useful here: combine your expressions into one big
expression, and use the new "lastgroup" attribute to figure
out which one that matched:
>>> import re
>>> p = re.compile("(?P<digits>\d+)|(?P<text>\w+)")
>>> m = p.search("123 456")
>>> print m.lastgroup, m.groups()
digits ('123', None)
(however, keeping track of subgroups can be a major PITA
with this approach...)
</F>
<!-- daily news from the python universe:
http://www.pythonware.com/daily/index.htm
-->
Sent via Deja.com http://www.deja.com/
Before you buy.
More information about the Python-list
mailing list