Why is regex so slow?
André Malo
ndparker at gmail.com
Tue Jun 18 15:59:32 EDT 2013
* Johannes Bauer wrote:
> The pre-check version is about 42% faster in my case (0.75 sec vs. 1.3
> sec). Curious. This is Python 3.2.3 on Linux x86_64.
A lot of time is spent with dict lookups (timings at my box, Python 3.2.3)
in your inner loop (1500000 times...)
#!/usr/bin/python3
import re
pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
count = 0
for line in open('error.log'):
#if 'ENQ' not in line:
# continue
m = pattern.search(line)
if m:
count += 1
print(count)
runs ~ 1.39 s
replacing some dict lookups with index lookups:
#!/usr/bin/python3
def main():
import re
pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
count = 0
for line in open('error.log'):
#if 'ENQ' not in line:
# continue
m = pattern.search(line)
if m:
count += 1
print(count)
main()
runs ~ 1.15s
and further:
#!/usr/bin/python3
def main():
import re
pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
search = pattern.search
count = 0
for line in open('error.log'):
#if 'ENQ' not in line:
# continue
m = search(line)
if m:
count += 1
print(count)
main()
runs ~ 1.08 s
and for reference:
#!/usr/bin/python3
def main():
import re
pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
search = pattern.search
count = 0
for line in open('error.log'):
if 'ENQ' not in line:
continue
m = search(line)
if m:
count += 1
print(count)
main()
runs ~ 0.71 s
The final difference is probably just the difference between a hardcoded
string search and a generic NFA.
nd
More information about the Python-list
mailing list