search versus match in re module

John Machin sjmachin at lexicon.net
Sun Nov 26 03:15:34 EST 2006


wo_shi_big_stomach wrote:

> Thanks for the great tip about fileinput.input(), and thanks to all who
> answered my query. I've pasted the working code below.
>
[snip]
> 	# check first line only
> 	elif fileinput.isfirstline():
> 		if not re.search('^From ',line):

This "works", and in this case you are doing it on only the first line
in each file, but for future reference:

1. Read the re docs section about when to use search and when to use
match; the "^" anchor in your pattern means that search and match give
the same result here.

However the time they take to do it can differ quite a bit :-0

C:\junk>\python25\python -mtimeit -s"import re;text='x'*100"
"re.match('^From ',
text)"
100000 loops, best of 3: 4.39 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*1000"
"re.match('^From '
,text)"
100000 loops, best of 3: 4.41 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*10000"
"re.match('^From
',text)"
100000 loops, best of 3: 4.4 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*100"
"re.search('^From '
,text)"
100000 loops, best of 3: 6.54 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*1000"
"re.search('^From
',text)"
10000 loops, best of 3: 26 usec per loop

C:\junk>\python25\python -mtimeit -s"import re;text='x'*10000"
"re.search('^From
 ',text)"
1000 loops, best of 3: 219 usec per loop

Aside: I noticed this years ago but assumed that the simple
optimisation of search was not done as a penalty on people who didn't
RTFM, and so didn't report it :-)

2. Then realise that your test is equivalent to

if not line.startswith('^From '):

which is much easier to understand without the benefit of comments, and
(bonus!) is also much faster than re.match:

C:\junk>\python25\python -mtimeit -s"text='x'*100"
"text.startswith('^From ')"
1000000 loops, best of 3: 0.584 usec per loop

C:\junk>\python25\python -mtimeit -s"text='x'*1000"
"text.startswith('^From ')"
1000000 loops, best of 3: 0.583 usec per loop

C:\junk>\python25\python -mtimeit -s"text='x'*10000"
"text.startswith('^From ')"

1000000 loops, best of 3: 0.612 usec per loop

HTH,
John




More information about the Python-list mailing list