Efficiently test for positive re.match then use the result?

Wed Mar 3 14:53:50 EST 2004

On  3 Mar 2004, elmlish <- elmlish at netscape.net wrote:

> Mostly what I've seen people do is to first test for the match, and then 
> try matching again to get the results.  This would seem to be pretty 
> inefficient to me.

Where did you see that?  Often you see code like:

m = re.match('foo', 'foobar')
if m: do_something with m

> I've tried making the match, then sending it to a variable, then testing 
> if the variable is good and then finally using it, but this still seems 
> overkill.

It isn't; you can't have directly an return value from assignment in
Python like e.g in C, so you can't write code like:

if m = re.match('foo', 'foobar'): do_somethiing_with_m

Global vars are also normally not set from re-matching (you could write
your own matching function which sets a global var; but that's seldom a
good idea ).

> I'm also trying to use this in list comprehensions, mostly because they 
> are kind of fun.  What I've got right now looks something like this.

>>>> alist = ['boo','hoo','choo']
>>>> [re.match('choo',line) for line in calist if re.match('choo',line)]
> [<_sre.SRE_Match object at 0x11e218>]

> This is a small test, but what I will be looking for various matches in 
> is a large special purpose text file.

> Does anyone have input on how something like this _should_ be done?
> thanks,

I don't know how it _should_ be done but I can tell you how it _could_ be
done.

Use a class like the following:

class Matcher (object):
    __slots__  = ('reg', 'match')

    def __init__ (self, reg):
        self.reg = reg
        self.match = None

    def __call__ (self, val):
        self.match = self.reg(val)
        if self.match:
            return True

Now you could use it like:

>>> alist = ['boo','hoo','choo']
>>> reg = Matcher(re.compile('choo').match)
>>> [reg.match for c in alist if reg(c)]
[<_sre.SRE_Match object at 0xb3de58>]
>>> 

That's no overkill.

But if you wanted it even lighter you could use a closure (but don't
tell anyone :-) )

def matcher (reg):
    res = [None]
    def fun (s):
        m = reg(s)
        if m: res[0] = m
        return m
    return res, fun

You use it like that:

>>> res, reg = matcher(re.compile('choo').match)
>>> [res[0] for c in alist if reg(c)]
[<_sre.SRE_Match object at 0xa14e90>]
>>> 

Or you simply write:

>>> reg = re.compile('choo').match
>>> filter(None, [reg(line) for line in alist])
[<_sre.SRE_Match object at 0xb3df38>]

Even for a big list filter(None, ...) is fast.

   KP

-- 
You know you've been sitting in front of your Lisp machine too long
when you go out to the junk food machine and start wondering how to
make it give you the CADR of Item H so you can get that yummie
chocolate cupcake that's stuck behind the disgusting vanilla one.