re.match() performance

Peter Otten __peter__ at web.de
Thu Dec 18 09:19:49 EST 2008


Emanuele D'Arrigo wrote:

> I've written the code below to test the differences in performance
> between compiled and non-compiled regular expression matching but I
> don't quite understand the results. It appears that the compiled the
> pattern only takes 2% less time to process the match. Is there some
> caching going on in the uncompiled section that prevents me from
> noticing its otherwise lower speed?

Yes:

>>> import re
>>> re._cache
{}
>>> re.match("yadda", "")
>>> re._cache
{(<class 'str'>, 'yadda', 0): <_sre.SRE_Pattern object at 0x2ac6e66e9e70>}

Hint: questions like this are best answered by the source code, and Python
is open source. You don't even have to open an editor:

>>> import inspect
>>> print(inspect.getsource(re.match))
def match(pattern, string, flags=0):
    """Try to apply the pattern at the start of the string, returning
    a match object, or None if no match was found."""
    return _compile(pattern, flags).match(string)

>>> print(inspect.getsource(re._compile))
def _compile(*key):
    # internal: compile pattern
    cachekey = (type(key[0]),) + key
    p = _cache.get(cachekey)
    if p is not None:
        return p
    pattern, flags = key
    if isinstance(pattern, _pattern_type):
        if flags:
            raise ValueError(
                "Cannot process flags argument with a compiled pattern")
        return pattern
    if not sre_compile.isstring(pattern):
        raise TypeError("first argument must be string or compiled pattern")
    p = sre_compile.compile(pattern, flags)
    if len(_cache) >= _MAXCACHE:
        _cache.clear()
    _cache[cachekey] = p
    return p


Peter



More information about the Python-list mailing list