String search vs regexp search

Tue Oct 14 04:46:06 EDT 2003

Duncan Booth <duncan at NOSPAMrcp.co.uk> wrote in
news:Xns9414603182031duncanrcpcouk at 127.0.0.1: 

>> Both regular expression searching and string.find will do searching
>> one character at a time; given that, it seems impossible to me that
>> the hand-coded-in-C "naive" string.find could be slower than the
>> machine-translated-coded-in-Python regular expression search. 
>> Compilation time only serves to further increase string.find's
>> advantage.
>> 
> I may have misremembered, but I thought there was a thread discussing
> this a little while back which claimed that the regular expression
> library looked for constant strings at the start of the regex, and if
> it found one used Boyer-Moore to do the search. If it does, then
> regular expressions searching for a constant string certainly ought to
> be much faster than a plain string.find (as the length of the searched
> string tends towards infinity).
> 
> If it doesn't, then it should.

Ok, found the code. Regular expression searches do indeed use a form of 
Boyer-Moore, but not if you are ignoring case. So by specifying 
re.IGNORECASE the OP got a double hit, not only does the code have to do 
case insensitive comparisons, but it also has to crawl along looking at 
every character in the search string instead of skipping most of them.

-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?