Why is regex so slow?

André Malo ndparker at gmail.com
Tue Jun 18 15:59:32 EDT 2013


* Johannes Bauer wrote:

> The pre-check version is about 42% faster in my case (0.75 sec vs. 1.3
> sec). Curious. This is Python 3.2.3 on Linux x86_64.

A lot of time is spent with dict lookups (timings at my box, Python 3.2.3)
in your inner loop (1500000 times...)

#!/usr/bin/python3
import re
pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
count = 0
for line in open('error.log'):
    #if 'ENQ' not in line:
    #    continue
    m = pattern.search(line)
    if m:
        count += 1
print(count)

runs ~ 1.39 s

replacing some dict lookups with index lookups:

#!/usr/bin/python3
def main():
    import re
    pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
    count = 0
    for line in open('error.log'):
        #if 'ENQ' not in line:
        #    continue
        m = pattern.search(line)
        if m:
            count += 1
    print(count)
main()

runs ~ 1.15s

and further:

#!/usr/bin/python3
def main():
    import re
    pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
    search = pattern.search
    count = 0
    for line in open('error.log'):
        #if 'ENQ' not in line:
        #    continue
        m = search(line)
        if m:
            count += 1
    print(count)
main()

runs ~ 1.08 s

and for reference:

#!/usr/bin/python3
def main():
    import re
    pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
    search = pattern.search
    count = 0
    for line in open('error.log'):
        if 'ENQ' not in line:
            continue
        m = search(line)
        if m:
            count += 1
    print(count)
main()

runs ~ 0.71 s

The final difference is probably just the difference between a hardcoded
string search and a generic NFA.

nd



More information about the Python-list mailing list