String search vs regexp search

Duncan Booth duncan at NOSPAMrcp.co.uk
Wed Oct 15 04:14:29 EDT 2003


Alex Martelli <aleax at aleax.it> wrote in
news:pCVib.282714$R32.9272822 at news2.tin.it: 

> Jeremy Fincher wrote:
> 
>> Duncan Booth <duncan at NOSPAMrcp.co.uk> wrote in message
>> news:<Xns941462CDB5003duncanrcpcouk at 127.0.0.1>...
>>> Ok, found the code. Regular expression searches do indeed use a form
>>> of Boyer-Moore, but not if you are ignoring case. So by specifying
>>> re.IGNORECASE the OP got a double hit, not only does the code have
>>> to do case insensitive comparisons, but it also has to crawl along
>>> looking at every character in the search string instead of skipping
>>> most of them. 
>> 
>> That's cool!  Where'd you find the code?
> 
> Hmmm, dist/src/Modules/_sre.c in the Python CVS tree?  Or
> Modules/_sre.c in a standard source distribution?
> 
Close, but it is actually a little bit complicated. _sre.c has the code 
that does the search, including the skipping forward using an overlap 
table, but the bit which checks for a literal prefix and builds the overlap 
table is in lib/src_compile.py. So its in the Python code you have to look 
to find that it ignores literal prefixes when ignoring case. 

-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?




More information about the Python-list mailing list