[Python-ideas] Deprecate str.find

Raymond Hettinger raymond.hettinger at gmail.com
Sun Jul 17 10:09:35 CEST 2011


On Jul 17, 2011, at 12:15 AM, Nick Coghlan wrote:
> 
> Indeed, the problem as I see it is that our general idiom for
> functions and methods that raise 'Not Found' exceptions is to accept
> an optional parameter that specifies a value to return in the Not
> Found case.

There's a difference between methods that return looked-up values
(where a default might make sense) versus a method that returns
an index (where it usually makes no sense at all).

> 
> For historical reasons, we currently break that idiom for index()
> methods: instead of supplying an extra parameter to str.index, one
> instead switches to a completely different method (.find()) with no
> control over the sentinel value returned (it's always -1). For other
> sequences (e.g. list), there's no find equivalent, so you *have* to
> write the exception handling out explicitly.
> 
> My proposal is to update the signature of index() (for all sequences,
> including the ABC) to follow the standard 'Not Found' idiom by
> accepting a 'missing' parameter that is returned for those cases where
> ValueError would otherwise be raised.
> 
> Code that uses str.find would continue to work, but the recommended
> alternative would be obj.index(x, missing=None) (or appropriate
> default value). I would advise against any actual deprecation of
> str,find (cf. the deliberate lack of optparse deprecation).
> 
> It's unfortunate that backwards compatibility means we can't use the
> more descriptive name, but that's life.
> 
> However, I already have too much on my plate to push this forward for
> Python 3.3. I'm able to offer advice if someone would like to try
> their hand at writing a PEP, though.

If someone takes this out of python-ideas land and into a serious PEP,
they should be prepared to answer a number of tough questions:

* Is this actually necessary?  Is there something you currently can't code?
If not, then it adds API complexity without adding any new capabilities.
There is a high threshold for expanding the string API -- this would affect
everyone learning python, every book written, every lint tool, every class
seeking to be string-like, etc.  So, it would need be a substantive improvement
to be accepted.

* Take a look at what other languages do.  Practically every general
purpose language has an API for doing substring searches.  Since
we're not blazing new territory here, there needs to be a good precedent
for this change (no shooting from the hip when the problem has already
been well solved many times over).

* Use Google's code search to identify examples of real world code
that would better with the new API.   If the only use case is creating
a new slicing one-liner, that likely is too rare and arcane to warrant
a change.

* Consider the effects of adding a second-way-to-do-it.  Will it add to the
learning curve, cause debates about the best way in a given situation,
add more PEP 8 entries and pylint checks? Is it worth introducing
version incompatibilities (i.e. runs on 3.3 but not earlier), etc.

* What should the default value be? Is there any non-numerical result
that ever makes sense; otherwise, you're just making a alias for the -1
currently returned by str.find().  If the default is some value that evaluates
to False, will that create a common error where an if-test fails to disambiguate
the default value from a substring found at position zero.  If the new API
is ambiguous or confusing in *any* way, then it will be a step backwards
and make Python worse rather than better.

* See if you can find examples where people have already found the
need to write a helper function such as:

    def index_default(s, sub, default):
          try:
                return s.index(sub)
          except ValueError:
                return default

If you find code like that in the wild, it may be an indication that people
want this.  If you don't, it may indicate otherwise.

* Good API design requires some thinking about function/method
signatures.  Would making this a keyword-only argument soive the
positional arguments problem?  Since str.index() already takes arguments
for the "start" and "end" index, is the full signature readable without keywords:

    mystr.index(possible_substr, 0, -1, default_value)

Also look at the signature for the return value.  Currently, it always returns
a number, but if it can return a number or anything else, then all client
code must be prepared to handle the alternatives with clean looking code
that is self-evidently correct.

* Perhaps talk to some people who write python code for a living
to determine if they've ever needed this or whether it would end-up
as cruft.  (In my case, the answer is that I've not needed or wanted
this in a over a decade of heavy Python use).

Hopefully, this short and incomplete list will provide a good basis
for thinking about whether the proposal is a good idea.  Defending
a PEP is no fun at all, so put in all your deep thinking up front.

Cheers,


Raymond






More information about the Python-ideas mailing list