String search vs regexp search
Duncan Booth
duncan at NOSPAMrcp.co.uk
Mon Oct 13 04:39:20 EDT 2003
pythonguy at Hotpop.com (Anand Pillai) wrote in
news:84fc4588.0310120655.ea95af6 at posting.google.com:
> To search a word in a group of words, say a paragraph or a web page,
> would a string search or a regexp search be faster?
>
> The string search would of course be,
>
> if str.find(substr) != -1:
> domything()
>
> And the regexp search assuming no case restriction would be,
>
> strre=re.compile(substr, re.IGNORECASE)
>
> m=strre.search(str)
> if m:
> domything()
>
> I was about to do a test, then I thought someone here might have
> some data on this already.
>
Yes. The answer is 'it all depends'.
Things it depends on include:
Your two bits of code do different things, one is case sensitive, one
ignores case. Which did you need?
How long is the string you are searching? How long is the substring?
Is the substring the same every time, or are you always searching for
different strings. Can the substring contain characters with special
meanings for regular expressions?
The regular expression code has a startup penalty since it has to compile
the regular expression at least once, however the actual searching may be
faster than the naive str.find. If the time spent doing the search is
sufficiently long compared with the time doing the compile, the regular
expression may win out.
Bottom line: write the code so it is as clean and maintainable as possible.
Only worry about optimising this if you have timed it and know that your
searches are a bottleneck.
--
Duncan Booth duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
More information about the Python-list
mailing list