[Python-ideas] Deprecate str.find

Ethan Furman ethan at stoneleaf.us
Sun Jul 17 11:30:09 CEST 2011


Raymond Hettinger wrote:
> On Jul 17, 2011, at 12:15 AM, Nick Coghlan wrote:
>> Indeed, the problem as I see it is that our general idiom for
>> functions and methods that raise 'Not Found' exceptions is to accept
>> an optional parameter that specifies a value to return in the Not
>> Found case.
> 
> There's a difference between methods that return looked-up values
> (where a default might make sense) versus a method that returns
> an index (where it usually makes no sense at all).

We are not talking about a default value to return -- the default will 
still be the behavior of raising a ValueError if the substring is not 
found.  Consider the proposed signature:

_sentinal = object()
class str():
     def index(substring, start, end, missing=_sentinal):
         # looks for string
         ....
         # string not found -- now what?
         if missing is _sentinal:
             raise ValueError('...')
         else:
             return missing

The addition is that *if* the caller specifies an object for missing, 
return that value, *otherwise* raise ValueError just like we do now.


>> For historical reasons, we currently break that idiom for index()
>> methods: instead of supplying an extra parameter to str.index, one
>> instead switches to a completely different method (.find()) with no
>> control over the sentinel value returned (it's always -1). For other
>> sequences (e.g. list), there's no find equivalent, so you *have* to
>> write the exception handling out explicitly.
>>
>> My proposal is to update the signature of index() (for all sequences,
>> including the ABC) to follow the standard 'Not Found' idiom by
>> accepting a 'missing' parameter that is returned for those cases where
>> ValueError would otherwise be raised.
>>
>> Code that uses str.find would continue to work, but the recommended
>> alternative would be obj.index(x, missing=None) (or appropriate
>> default value).

Hmmm -- okay, perhaps we are... let me say, then, that I agree having a 
default return is not the way to go; this would break everything that 
expects .index() to exception out if the substring is not found -- in 
other words, everything that uses .index().  My take on the idea is to 
have the new 'missing' argument be optional, and if not specified then 
current behavior is unchanged, but if specified then that value is 
returned instead.


> If someone takes this out of python-ideas land and into a serious PEP,
> they should be prepared to answer a number of tough questions:
> 
> * Is this actually necessary?  Is there something you currently can't code?
> If not, then it adds API complexity without adding any new capabilities.
> There is a high threshold for expanding the string API -- this would affect
> everyone learning python, every book written, every lint tool, every class
> seeking to be string-like, etc.

Trying to be string-like at the moment is such a PITA I really don't see 
this tiny extra bit as a serious burden.  Consider this nice simple code:

class MyStr(str):
     def find(substr, start=None, end=None):
         # whatever extra I want to do before passing off to str
         # now pass off to str
         return str.find(substr, start, end)

Too bad it doesn't work:

--> test = MyStr('this is a test')
--> test.find('is')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "<stdin>", line 5, in find
TypeError: slice indices must be integers or None or have an __index__ 
method

(Yes, this was fixed in 2.6, and as soon as I'm willing to drop support 
for earlier versions I can remove the following boilerplate:

         start = start or 0
         end = end or len(self)

and yes, I wouldn't be able to use the new .index(..., 
missing=_whatever) for a while, but that doesn't mean we should stop 
improving the language.)


> * Take a look at what other languages do.  Practically every general
> purpose language has an API for doing substring searches.  Since
> we're not blazing new territory here, there needs to be a good precedent
> for this change (no shooting from the hip when the problem has already
> been well solved many times over).

Why not?  'Well solved' does not mean there is no room for improvement. 
    And going through the whole PEP process does not feel like 'shooting 
from the hip'.


> * Consider the effects of adding a second-way-to-do-it.  Will it add to the
> learning curve, cause debates about the best way in a given situation,
> add more PEP 8 entries and pylint checks? Is it worth introducing
> version incompatibilities (i.e. runs on 3.3 but not earlier), etc.

You mean like 'runs on 2.6+ but not earlier'?


> * What should the default value be?

There should be no default value, in my opinion.
> Is there any non-numerical result
> that ever makes sense; otherwise, you're just making a alias for the -1
> currently returned by str.find().  If the default is some value that evaluates
> to False, will that create a common error where an if-test fails to disambiguate
> the default value from a substring found at position zero.

The most effective argument by far, IMO, both for not having a default 
value, and for being very careful about what the caller chooses to use 
for the missing argument.  I think a bomb would be appropriate here:

class Bomb():
     'singleton object:  blows up on any usage'
     def __bool__(self):
         raise OopsError('yell at the programmer!")
     etc

then in usage it's a check for object identity, anything else reminds 
somebody they forgot to do something.

> Hopefully, this short and incomplete list will provide a good basis
> for thinking about whether the proposal is a good idea.  Defending
> a PEP is no fun at all, so put in all your deep thinking up front.

Many good points -- thank you for taking the time.

~Ethan~



More information about the Python-ideas mailing list