Inconsistent behaviour os str.find/str.index when providing optional parameters

Hans Mulder hansmu at xs4all.nl
Wed Nov 21 14:25:09 EST 2012


On 21/11/12 17:59:05, Alister wrote:
> On Wed, 21 Nov 2012 04:43:57 -0800, Giacomo Alzetta wrote:
> 
>> I just came across this:
>>
>>>>> 'spam'.find('', 5)
>> -1
>>
>>
>> Now, reading find's documentation:
>>
>>>>> print(str.find.__doc__)
>> S.find(sub [,start [,end]]) -> int
>>
>> Return the lowest index in S where substring sub is found,
>> such that sub is contained within S[start:end].  Optional arguments
>> start and end are interpreted as in slice notation.
>>
>> Return -1 on failure.
>>
>> Now, the empty string is a substring of every string so how can find
>> fail?
>> find, from the doc, should be generally be equivalent to
>> S[start:end].find(substring) + start, except if the substring is not
>> found but since the empty string is a substring of the empty string it
>> should never fail.
>>
>> Looking at the source code for find(in stringlib/find.h):
>>
>> Py_LOCAL_INLINE(Py_ssize_t)
>> stringlib_find(const STRINGLIB_CHAR* str, Py_ssize_t str_len,
>>                const STRINGLIB_CHAR* sub, Py_ssize_t sub_len,
>>                Py_ssize_t offset)
>> {
>>     Py_ssize_t pos;
>>
>>     if (str_len < 0)
>>         return -1;
>>
>> I believe it should be:
>>
>>     if (str_len < 0)
>>         return (sub_len == 0 ? 0 : -1);
>>
>> Is there any reason of having this unexpected behaviour or was this
>> simply overlooked?
> 
> why would you be searching for an empty string?
> what result would you expect to get from such a search?


In general, if

    needle in haystack[ start: ]

return True, then you' expect

    haystack.find(needle, start)

to return the smallest i >= start such that

    haystack[i:i+len(needle)] == needle

also returns True.

>>> "" in "spam"[5:]
True
>>> "spam"[5:5+len("")] == ""
True
>>>

So, you'd expect that spam.find("", 5) would return 5.

The only other consistent position would be that "spam"[5:]
should raise an IndexError, because 5 is an invalid index.

For that matter, I wouldn;t mind if "spam".find(s, 5) were
to raise an IndexError.  But if slicing at position 5
proudces an empry string, then .find should be able to
find that empty string.

-- HansM




More information about the Python-list mailing list