Inconsistent behaviour os str.find/str.index when providing optional parameters
Giacomo Alzetta
giacomo.alzetta at gmail.com
Wed Nov 21 15:21:46 EST 2012
Il giorno mercoledì 21 novembre 2012 20:25:10 UTC+1, Hans Mulder ha scritto:
> On 21/11/12 17:59:05, Alister wrote:
>
> > On Wed, 21 Nov 2012 04:43:57 -0800, Giacomo Alzetta wrote:
>
> >
>
> >> I just came across this:
>
> >>
>
> >>>>> 'spam'.find('', 5)
>
> >> -1
>
> >>
>
> >>
>
> >> Now, reading find's documentation:
>
> >>
>
> >>>>> print(str.find.__doc__)
>
> >> S.find(sub [,start [,end]]) -> int
>
> >>
>
> >> Return the lowest index in S where substring sub is found,
>
> >> such that sub is contained within S[start:end]. Optional arguments
>
> >> start and end are interpreted as in slice notation.
>
> >>
>
> >> Return -1 on failure.
>
> >>
>
> >> Now, the empty string is a substring of every string so how can find
>
> >> fail?
>
> >> find, from the doc, should be generally be equivalent to
>
> >> S[start:end].find(substring) + start, except if the substring is not
>
> >> found but since the empty string is a substring of the empty string it
>
> >> should never fail.
>
> >>
>
> >> Looking at the source code for find(in stringlib/find.h):
>
> >>
>
> >> Py_LOCAL_INLINE(Py_ssize_t)
>
> >> stringlib_find(const STRINGLIB_CHAR* str, Py_ssize_t str_len,
>
> >> const STRINGLIB_CHAR* sub, Py_ssize_t sub_len,
>
> >> Py_ssize_t offset)
>
> >> {
>
> >> Py_ssize_t pos;
>
> >>
>
> >> if (str_len < 0)
>
> >> return -1;
>
> >>
>
> >> I believe it should be:
>
> >>
>
> >> if (str_len < 0)
>
> >> return (sub_len == 0 ? 0 : -1);
>
> >>
>
> >> Is there any reason of having this unexpected behaviour or was this
>
> >> simply overlooked?
>
> >
>
> > why would you be searching for an empty string?
>
> > what result would you expect to get from such a search?
>
>
>
>
>
> In general, if
>
>
>
> needle in haystack[ start: ]
>
>
>
> return True, then you' expect
>
>
>
> haystack.find(needle, start)
>
>
>
> to return the smallest i >= start such that
>
>
>
> haystack[i:i+len(needle)] == needle
>
>
>
> also returns True.
>
>
>
> >>> "" in "spam"[5:]
>
> True
>
> >>> "spam"[5:5+len("")] == ""
>
> True
>
> >>>
>
>
>
> So, you'd expect that spam.find("", 5) would return 5.
>
>
>
> The only other consistent position would be that "spam"[5:]
>
> should raise an IndexError, because 5 is an invalid index.
>
>
>
> For that matter, I wouldn;t mind if "spam".find(s, 5) were
>
> to raise an IndexError. But if slicing at position 5
>
> proudces an empry string, then .find should be able to
>
> find that empty string.
>
>
>
> -- HansM
Exactly! Either string[i:] with i >= len(string) should raise an IndexError or find(string, i) should return i.
Anyway, thinking about this inconsistency can be solved in a simpler way and without adding comparson. You simply check the substring length first. If it is 0 you already know that the string is a substring of the given string and you return the "offset", so the two ifs at the beginning of the function ought to be swapped.
More information about the Python-list
mailing list