Inconsistent behaviour os str.find/str.index when providing optional parameters

Giacomo Alzetta giacomo.alzetta at gmail.com
Wed Nov 21 15:21:46 EST 2012


Il giorno mercoledì 21 novembre 2012 20:25:10 UTC+1, Hans Mulder ha scritto:
> On 21/11/12 17:59:05, Alister wrote:
> 
> > On Wed, 21 Nov 2012 04:43:57 -0800, Giacomo Alzetta wrote:
> 
> > 
> 
> >> I just came across this:
> 
> >>
> 
> >>>>> 'spam'.find('', 5)
> 
> >> -1
> 
> >>
> 
> >>
> 
> >> Now, reading find's documentation:
> 
> >>
> 
> >>>>> print(str.find.__doc__)
> 
> >> S.find(sub [,start [,end]]) -> int
> 
> >>
> 
> >> Return the lowest index in S where substring sub is found,
> 
> >> such that sub is contained within S[start:end].  Optional arguments
> 
> >> start and end are interpreted as in slice notation.
> 
> >>
> 
> >> Return -1 on failure.
> 
> >>
> 
> >> Now, the empty string is a substring of every string so how can find
> 
> >> fail?
> 
> >> find, from the doc, should be generally be equivalent to
> 
> >> S[start:end].find(substring) + start, except if the substring is not
> 
> >> found but since the empty string is a substring of the empty string it
> 
> >> should never fail.
> 
> >>
> 
> >> Looking at the source code for find(in stringlib/find.h):
> 
> >>
> 
> >> Py_LOCAL_INLINE(Py_ssize_t)
> 
> >> stringlib_find(const STRINGLIB_CHAR* str, Py_ssize_t str_len,
> 
> >>                const STRINGLIB_CHAR* sub, Py_ssize_t sub_len,
> 
> >>                Py_ssize_t offset)
> 
> >> {
> 
> >>     Py_ssize_t pos;
> 
> >>
> 
> >>     if (str_len < 0)
> 
> >>         return -1;
> 
> >>
> 
> >> I believe it should be:
> 
> >>
> 
> >>     if (str_len < 0)
> 
> >>         return (sub_len == 0 ? 0 : -1);
> 
> >>
> 
> >> Is there any reason of having this unexpected behaviour or was this
> 
> >> simply overlooked?
> 
> > 
> 
> > why would you be searching for an empty string?
> 
> > what result would you expect to get from such a search?
> 
> 
> 
> 
> 
> In general, if
> 
> 
> 
>     needle in haystack[ start: ]
> 
> 
> 
> return True, then you' expect
> 
> 
> 
>     haystack.find(needle, start)
> 
> 
> 
> to return the smallest i >= start such that
> 
> 
> 
>     haystack[i:i+len(needle)] == needle
> 
> 
> 
> also returns True.
> 
> 
> 
> >>> "" in "spam"[5:]
> 
> True
> 
> >>> "spam"[5:5+len("")] == ""
> 
> True
> 
> >>>
> 
> 
> 
> So, you'd expect that spam.find("", 5) would return 5.
> 
> 
> 
> The only other consistent position would be that "spam"[5:]
> 
> should raise an IndexError, because 5 is an invalid index.
> 
> 
> 
> For that matter, I wouldn;t mind if "spam".find(s, 5) were
> 
> to raise an IndexError.  But if slicing at position 5
> 
> proudces an empry string, then .find should be able to
> 
> find that empty string.
> 
> 
> 
> -- HansM

Exactly! Either string[i:] with i >= len(string) should raise an IndexError or find(string, i) should return i.

Anyway, thinking about this inconsistency can be solved in a simpler way and without adding comparson. You simply check the substring length first. If it is 0 you already know that the string is a substring of the given string and you return the "offset", so the two ifs at the beginning of the function ought to be swapped.



More information about the Python-list mailing list