Testing if string contains a substring
Alex Martelli
aleax at aleax.it
Wed Apr 23 17:49:51 EDT 2003
Bengt Richter wrote:
> On Wed, 23 Apr 2003 14:50:21 GMT, Alex Martelli <aleax at aleax.it> wrote:
> [...]
>>However, such new functionality doesn't get back-ported to previous
>>releases of Python, such as 2.2.*. In all Python releases from 1,6
>>included to 2.3 excluded, "needle.find(haystack) > 0" is the idiom.
>
> If needle is a substring to be found in haystack, that seems reversed.
Ooops, yes -- "haystack.find(needle) >= 0" is what I meant. Pretty
bad thinko there -- sorry!
> Were you recently using re to find needles? E.g.,
>
> >>> import re
> >>> needle = re.compile('needle')
> >>> needle.findall(haystack)
> ['needle', 'needle']
Yep. If you're in no special hurry, that works, too.
Finding just one (leftmost) occurrence, string vs re:
[alex at lancelot src]$ ./python -O Lib/timeit.py -s'import re'
-s'haystack="needle in haystack with another needle somewhere";
needle="needle"' -s'reneedle=re.compile(needle)' 'if haystack.find(needle)
>= 0: pass'
1000000 loops, best of 3: 1.04 usec per loop
[alex at lancelot src]$ ./python -O Lib/timeit.py -s'import re'
-s'haystack="needle in haystack with another needle somewhere";
needle="needle"' -s'reneedle=re.compile(needle)' 'if
reneedle.search(haystack): pass'
1000000 loops, best of 3: 1.41 usec per loop
Finding / counting all occurrences, string vs re:
[alex at lancelot src]$ ./python -O Lib/timeit.py -s'import re'
-s'haystack="needle in haystack with another needle somewhere";
needle="needle"' -s'reneedle=re.compile(needle)' 'if
haystack.count(needle): pass'
1000000 loops, best of 3: 1.77 usec per loop
[alex at lancelot src]$ ./python -O Lib/timeit.py -s'import re'
-s'haystack="needle in haystack with another needle somewhere";
needle="needle"' -s'reneedle=re.compile(needle)' 'if
reneedle.findall(haystack): pass'
100000 loops, best of 3: 4.23 usec per loop
but of course, in Python 2.3, the 'in' operator is fastest as well
as handiest (and it, also, requires the needle on the left):
[alex at lancelot src]$ ./python -O Lib/timeit.py -s'import re'
-s'haystack="needle in haystack with another needle somewhere";
needle="needle"' -s'reneedle=re.compile(needle)' 'if needle in haystack:
pass'
1000000 loops, best of 3: 0.272 usec per loop
RE's have the advantage of easily allowing case-insensitive search -- that's
my most common reason for using them. Second most common is \b, the "word
boundary" marker.
Alex
More information about the Python-list
mailing list