Testing if string contains a substring

Wed Apr 23 17:49:51 EDT 2003

Bengt Richter wrote:

> On Wed, 23 Apr 2003 14:50:21 GMT, Alex Martelli <aleax at aleax.it> wrote:
> [...]
>>However, such new functionality doesn't get back-ported to previous
>>releases of Python, such as 2.2.*.  In all Python releases from 1,6
>>included to 2.3 excluded, "needle.find(haystack) > 0" is the idiom.
> 
> If needle is a substring to be found in haystack, that seems reversed.

Ooops, yes -- "haystack.find(needle) >= 0" is what I meant.  Pretty
bad thinko there -- sorry!

> Were you recently using re to find needles? E.g.,
> 
>  >>> import re
>  >>> needle = re.compile('needle')
>  >>> needle.findall(haystack)
>  ['needle', 'needle']

Yep.  If you're in no special hurry, that works, too.

Finding just one (leftmost) occurrence, string vs re:

[alex at lancelot src]$ ./python -O Lib/timeit.py -s'import re' 
-s'haystack="needle in haystack with another needle somewhere"; 
needle="needle"' -s'reneedle=re.compile(needle)'  'if haystack.find(needle) 
>= 0: pass'
1000000 loops, best of 3: 1.04 usec per loop

[alex at lancelot src]$ ./python -O Lib/timeit.py -s'import re' 
-s'haystack="needle in haystack with another needle somewhere"; 
needle="needle"' -s'reneedle=re.compile(needle)'  'if 
reneedle.search(haystack): pass'
1000000 loops, best of 3: 1.41 usec per loop

Finding / counting all occurrences, string vs re:

[alex at lancelot src]$ ./python -O Lib/timeit.py -s'import re' 
-s'haystack="needle in haystack with another needle somewhere"; 
needle="needle"' -s'reneedle=re.compile(needle)'  'if 
haystack.count(needle): pass'
1000000 loops, best of 3: 1.77 usec per loop

[alex at lancelot src]$ ./python -O Lib/timeit.py -s'import re' 
-s'haystack="needle in haystack with another needle somewhere"; 
needle="needle"' -s'reneedle=re.compile(needle)'  'if 
reneedle.findall(haystack): pass'
100000 loops, best of 3: 4.23 usec per loop

but of course, in Python 2.3, the 'in' operator is fastest as well
as handiest (and it, also, requires the needle on the left):

[alex at lancelot src]$ ./python -O Lib/timeit.py -s'import re' 
-s'haystack="needle in haystack with another needle somewhere"; 
needle="needle"' -s'reneedle=re.compile(needle)'  'if needle in haystack: 
pass'
1000000 loops, best of 3: 0.272 usec per loop

RE's have the advantage of easily allowing case-insensitive search -- that's
my most common reason for using them.  Second most common is \b, the "word
boundary" marker.

Alex