[Python-ideas] Deprecate str.find

Steven D'Aprano steve at pearwood.info
Fri Jul 15 21:32:41 CEST 2011


Mike Graham wrote:
> str.find (and bytes.find) is worse than the alternatives in every way.
> It should be explicitly deprecated in favour of str.__contains__ and
> str.index.

I disagree.


> str.find when used to check for substring is inferior to the in
> operator. "if sub in s:" is shorter, easier-to-read, and more
> efficient than "if s.find(sub) != -1:" and is not prone to the error
> "if s.find(sub):" I have occasionally seen.

Just because some people (allegedly) misuse str.find is not a reason to 
remove it. People misuse all sorts of things.

I don't believe that it is valid to compare str.find to str.__contains__ 
since they do different things for different purposes. Using str.find 
instead of "in" is not misuse if you actually need an index. Better to 
do a single walk of the source string:

p = s.find(sub)
if p >= 0:
     # do something
else: ...

than wastefully do two:

if sub in s:
     p = s.index(sub)
     # do something
else: ...


Whatever efficiency you might gain in the "substring not found" case, 
you lose in the "found case".

You should only use "sub in s" when you don't care about *where* the 
substring is, only whether or not it is there. Strings are not dicts, 
and searching is not necessarily fast. If I'm searching the string 
twice, I'm doing it wrong.

Since str.__contains__ is not a valid replacement for str.find, the only 
question is, should str.find be deprecated in favour of str.index?

I say no. str.find is just too useful and neat, compared to catching an 
exception, to throw out. And it can be considerably faster.

For long strings, the time taken for an unsuccessful search may be 
dominated by the time to traverse the string, and consequently the two 
alternatives are pretty close to the same speed:


 >>> from timeit import Timer
 >>> s = "abcdef"*1000000
 >>> sub = "xyz"
 >>> setup = "from __main__ import s, sub"
 >>> t1 = Timer("s.find(sub) != -1", setup)
 >>> t2 = Timer("""try:
...     s.index(sub)
... except ValueError:
...     pass
... """, setup)
 >>>
 >>> t1.timeit(number=10000)
109.69042301177979
 >>> t2.timeit(number=10000)
116.63023090362549

Catching the exception is only 6% slower than testing for -1. Not much 
difference, and we probably shouldn't care one way or the other.

However, for short strings, the time taken may be dominated by the cost 
of catching the exception, and so str.find may be significantly faster:

 >>> s = "abc"*10
 >>> sub = "x"
 >>> t1.timeit()
0.5977561473846436
 >>> t2.timeit()
1.698801040649414

s.index here is nearly three times slower than s.find.

(And of course, if the substring is present, index and find should be 
pretty much identical in speed.)



> str.index is better for finding indices in that it supports an
> idiomatic exception-based API rather than a return-code API. 

Being idiomatic is not better merely because it is idiomatic. Rather, 
what's better becomes idiomatic, rather than the other way around, 
because people re-use code examples that work well.

I expect that in practice str.find is used rather more frequently than 
str.index, which suggests that at least when it comes to string 
searching, find is the idiomatic API.


> Every
> usage of str.find should look like "index = s.find(sub); if index ==
> -1: (exception code)", 

"Every" usage? I don't think so. Another common and valid usage is this 
pattern:


index = s.find(sub)
if index >= 0:
     # do something


Written with exception handling it becomes significantly longer, 
trickier and less obvious for beginners:


try:
     index = s.index(sub)
except ValueError:
     pass
else:
     # do something


Note especially that this takes the least interesting case, the "do 
nothing if not found", and promotes it ahead of the interesting case "do 
something if found". Now that's an anti-pattern! (Albeit a mild one.)


And of course the try...except example is subject to its own conceptual 
failures. Both of these are subtly, or not-so-subtly, wrong:

try:
     index = s.index(sub)
     # do something
except ValueError:
     pass



try:
     index = s.index(sub)
except ValueError:
     pass
# do something


> which is an antipattern in Python.

Why do you think it is an anti-pattern?

I don't consider it an anti-pattern. I often wish that lists also had a 
find method that returned a sentinel instead of raising an exception. 
(Although I'd probably use None, as the re module does, rather than -1.)



>  This problem
> is compounded by the fact that the returned value is actually a valid
> value; consider s = 'bar'--s[s.find('x')] is somewhat surprisingly
> 'r'.

Yes, that's a good argument against the use of -1 for "not found". None 
would have been better.


> Additionally, the existence of str.find violates the
> there's-one-way-to-do-it principle.

You may be confusing Python with some other language, because there is 
no such principle in Python. Perhaps you are mistaking it for the Zen,

There should be one-- and preferably only one --obvious way to do it.

which is a statement requiring the existence of an obvious way, not a 
prohibition against there being multiple non-obvious ways.

In any case, it's far from clear to me that str.index is that obvious 
way. But then again, I'm not Dutch *wink*



-- 
Steven



More information about the Python-ideas mailing list