Inconsistent behaviour os str.find/str.index when providing optional parameters

MRAB python at mrabarnett.plus.com
Wed Nov 21 15:58:46 EST 2012


On 2012-11-21 19:25, Hans Mulder wrote:
> On 21/11/12 17:59:05, Alister wrote:
>> On Wed, 21 Nov 2012 04:43:57 -0800, Giacomo Alzetta wrote:
>>
>>> I just came across this:
>>>
>>>>>> 'spam'.find('', 5)
>>> -1
>>>
>>>
>>> Now, reading find's documentation:
>>>
>>>>>> print(str.find.__doc__)
>>> S.find(sub [,start [,end]]) -> int
>>>
>>> Return the lowest index in S where substring sub is found,
>>> such that sub is contained within S[start:end].  Optional arguments
>>> start and end are interpreted as in slice notation.
>>>
>>> Return -1 on failure.
>>>
>>> Now, the empty string is a substring of every string so how can find
>>> fail?
>>> find, from the doc, should be generally be equivalent to
>>> S[start:end].find(substring) + start, except if the substring is not
>>> found but since the empty string is a substring of the empty string it
>>> should never fail.
>>>
>>> Looking at the source code for find(in stringlib/find.h):
>>>
>>> Py_LOCAL_INLINE(Py_ssize_t)
>>> stringlib_find(const STRINGLIB_CHAR* str, Py_ssize_t str_len,
>>>                const STRINGLIB_CHAR* sub, Py_ssize_t sub_len,
>>>                Py_ssize_t offset)
>>> {
>>>     Py_ssize_t pos;
>>>
>>>     if (str_len < 0)
>>>         return -1;
>>>
>>> I believe it should be:
>>>
>>>     if (str_len < 0)
>>>         return (sub_len == 0 ? 0 : -1);
>>>
>>> Is there any reason of having this unexpected behaviour or was this
>>> simply overlooked?
>>
>> why would you be searching for an empty string?
>> what result would you expect to get from such a search?
>
>
> In general, if
>
>      needle in haystack[ start: ]
>
> return True, then you' expect
>
>      haystack.find(needle, start)
>
> to return the smallest i >= start such that
>
>      haystack[i:i+len(needle)] == needle
>
> also returns True.
>
>>>> "" in "spam"[5:]
> True
>>>> "spam"[5:5+len("")] == ""
> True
>>>>
>
> So, you'd expect that spam.find("", 5) would return 5.
>
> The only other consistent position would be that "spam"[5:]
> should raise an IndexError, because 5 is an invalid index.
>
> For that matter, I wouldn;t mind if "spam".find(s, 5) were
> to raise an IndexError.  But if slicing at position 5
> proudces an empry string, then .find should be able to
> find that empty string.
>
You'd expect that given:

     found = string.find(something, start, end)

if 'something' present then the following are true:

     0 <= found <= len(string)

     start <= found <= end

(I'm assuming here that 'start' and 'end' have already been adjusted
for counting from the end, ie originally they might have been negative
values.)

The only time that you can have found == len(string) and found == end
is when something == "" and start == len(string).




More information about the Python-list mailing list