Why is regex so slow?
Johannes Bauer
dfnsonfsduifb at gmx.de
Tue Jun 18 14:10:16 EDT 2013
On 18.06.2013 19:20, Chris Angelico wrote:
> Yeah, I'd try that against 3.3 before opening a performance bug. Also,
> it's entirely possible that performance is majorly different in 3.x
> anyway, on account of strings being Unicode. Definitely merits another
> look imho.
Hmmm, at least Python 3.2 seems to have the same issue. I generated test
data with:
#!/usr/bin/python3
import random
random.seed(0)
f = open("error.log", "w")
for i in range(1500000):
q = random.randint(0, 99)
if q == 0:
print("ENQUEUEING: /listen/ fhsduifhsd uifhuisd hfuisd hfuihds
iufhsd", file = f)
else:
print("fiosdjfoi sdmfio sdmfio msdiof msdoif msdoimf oisd mfoisdm f",
file = f)
Resulting file has a size of 91530018 and md5 of
2d20c3447a0b51a37d28126b8348f6c5 (just to make sure we're on the same
page because I'm not sure the PRNG is stable across Python versions).
Testing with:
#!/usr/bin/python3
import re
pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
count = 0
for line in open('error.log'):
# if 'ENQ' not in line:
# continue
m = pattern.search(line)
if m:
count += 1
print(count)
The pre-check version is about 42% faster in my case (0.75 sec vs. 1.3
sec). Curious. This is Python 3.2.3 on Linux x86_64.
Regards,
Johannes
--
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1 at speranza.aioe.org>
More information about the Python-list
mailing list