Regex Speed

Alejandro Dubrovsky dubrovsky at physics.uq.edu.au
Tue Feb 20 19:36:08 EST 2007


garrickp at gmail.com wrote:

> While creating a log parser for fairly large logs, we have run into an
> issue where the time to process was relatively unacceptable (upwards
> of 5 minutes for 1-2 million lines of logs). In contrast, using the
> Linux tool grep would complete the same search in a matter of seconds.
> 
> The search we used was a regex of 6 elements "or"ed together, with an
> exclusionary set of ~3 elements. Due to the size of the files, we
> decided to run these line by line, and due to the need of regex
> expressions, we could not use more traditional string find methods.

Just guessing (since I haven't tested this), switching from doing it line by
line to big chunks (whatever will fit in memory) at a time would help, but
I don't think you can get close to the speed of grep  (eg 
while True:
        chunk = thefile.read(100000000))
        if not len(chunk): break
        for x in theRE.findall(chunk):
                .....  
)
Function calls in python are expensive.






More information about the Python-list mailing list