Regex Speed
Gabriel Genellina
gagsl-py at yahoo.com.ar
Tue Feb 20 22:28:08 EST 2007
En Tue, 20 Feb 2007 21:40:40 -0300, <garrickp at gmail.com> escribió:
> My apologies. I don't have specifics right now, but it's something
> along the line of this:
>
> error_list = re.compile(r"error|miss|issing|inval|nvalid|math")
>
> Yes, I know, these are not re expressions, but the requirements for
> the script specified that the error list be capable of accepting
> regular expressions, since these lists are configurable.
Can you relax that restriction? Not always a regex is a good way,
specially if you want speed also:
py> import timeit
py> line = "a sample line that will not match any condition, but long
enough to
be meaninful in the context of this problem, or at least I thik so. This
has 174
characters, is it enough?"
py> timeit.Timer('if error_list.search(line): pass',
... 'import
re;error_list=re.compile(r"error|miss|issing|inval|nvalid|math");f
rom __main__ import line').repeat(number=10000)
[1.7704239587925394, 1.7289717746328725, 1.7057590543605246]
py> timeit.Timer('for token in tokens:\n\tif token in line: break\nelse:
pass',
... 'from __main__ import line;tokens =
"error|miss|issing|inval|nvalid|math".
split("|")').repeat(number=10000)
[1.0268617863829661, 1.050040144755787, 1.0677314944409151]
py> timeit.Timer('if "error" in line or "miss" in line or "issing" in line
or "i
nval" in line or "nvalid" in line or "math" in line: pass',
... 'from __main__ import line').repeat(number=10000)
[0.97102286155842066, 0.98341158348013913, 0.9651561957857222]
The fastest was is hard coding the tokens: if "error" in line or "miss" in
line or...
If that is not acceptable, iterating over a list of tokens: for token in
token: if token in line...
The regex is the slowest, a more carefully crafted regex is a bit faster,
but not enough:
py> timeit.Timer('if error_list.search(line): pass',
... 'import
re;error_list=re.compile(r"error|m(?:iss(?:ing)|ath)|inval(?:id)")
;from __main__ import line').repeat(number=10000)
[1.3974029108719606, 1.4247005067123837, 1.4071600141470526]
--
Gabriel Genellina
More information about the Python-list
mailing list