Python and regexp efficiency.. again.. :)

Markus Stenberg mstenber at cc.Helsinki.FI
Mon Dec 13 01:57:52 EST 1999


Yishai Beeri <yishai at platonix.com> writes:
> What percentage of the lines is expected to actually match?

Very few. Preferably none. Although the real match definition is as
follows: (expr|expr|expr|not expr) match. Thus, the last expr usually
matches.

> What percentage of the lines match the commonstring but none of the tails?

About all lines match initial commonstring, but then next sub-commonstrings
(that my specialized automated regexp optimizer notices) are rarer
(roughly, ~100 different cases, one matches about every time). The final
non-common parts do not usually match, except in terminal case.

> Would it be helpful to look just for the tails and get rid of erroneous
> matches by then looking for the commonstring?

Possibly, yes. Hmm.. I have to think about it - main problem is that last
"not expr" part, as not-matching-something is much more nontrivial than
matching-something.

> Yishai

-Markus

-- 
The IBM Principle:		  
	Machines should work.  People should think.
The Truth About the IBM Principle:
	Machines don't often work, people don't often think.



More information about the Python-list mailing list