String search vs regexp search

Duncan Booth duncan at NOSPAMrcp.co.uk
Mon Oct 13 04:39:20 EDT 2003


pythonguy at Hotpop.com (Anand Pillai) wrote in 
news:84fc4588.0310120655.ea95af6 at posting.google.com:

> To search a word in a group of words, say a paragraph or a web page,
> would a string search or a regexp search be faster?
> 
> The string search would of course be,
> 
> if str.find(substr) != -1:
>     domything()
> 
> And the regexp search assuming no case restriction would be,
> 
> strre=re.compile(substr, re.IGNORECASE)
> 
> m=strre.search(str)
> if m:
>    domything() 
> 
> I was about to do a test, then I thought someone here might have
> some data on this already.
> 
Yes. The answer is 'it all depends'.

Things it depends on include:

Your two bits of code do different things, one is case sensitive, one 
ignores case. Which did you need?

How long is the string you are searching? How long is the substring?

Is the substring the same every time, or are you always searching for 
different strings. Can the substring contain characters with special 
meanings for regular expressions?

The regular expression code has a startup penalty since it has to compile 
the regular expression at least once, however the actual searching may be 
faster than the naive str.find. If the time spent doing the search is 
sufficiently long compared with the time doing the compile, the regular 
expression may win out.

Bottom line: write the code so it is as clean and maintainable as possible. 
Only worry about optimising this if you have timed it and know that your 
searches are a bottleneck.

-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?




More information about the Python-list mailing list