Why is regex so slow?

Rick Johnson rantingrickjohnson at gmail.com
Tue Jun 18 15:21:07 EDT 2013


On Tuesday, June 18, 2013 11:45:29 AM UTC-5, Roy Smith wrote:
> I've got a 170 MB file I want to search for lines that look like:
> [2010-10-20 16:47:50.339229 -04:00] INFO (6): songza.amie.history - ENQUEUEING: /listen/the-station-one
> This code runs in 1.3 seconds:
> ------------------------------
> import re
> pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
> count = 0
> for line in open('error.log'):
>     m = pattern.search(line)
>     if m:
>         count += 1
> print count

Is the power of regexps required to solve such a simplistic problem? I believe string methods should suffice.

py> line = "[2010-10-20 16:47:50.339229 -04:00] INFO (6): songza.amie.history - ENQUEUEING: /listen/the-station-one"
py> idx = line.find('ENQ')
py> if idx > 0:
	match = line[idx:]
py> match
'ENQUEUEING: /listen/the-station-one'



More information about the Python-list mailing list