Inconsistent behaviour os str.find/str.index when providing optional parameters
Hans Mulder
hansmu at xs4all.nl
Wed Nov 21 14:25:09 EST 2012
On 21/11/12 17:59:05, Alister wrote:
> On Wed, 21 Nov 2012 04:43:57 -0800, Giacomo Alzetta wrote:
>
>> I just came across this:
>>
>>>>> 'spam'.find('', 5)
>> -1
>>
>>
>> Now, reading find's documentation:
>>
>>>>> print(str.find.__doc__)
>> S.find(sub [,start [,end]]) -> int
>>
>> Return the lowest index in S where substring sub is found,
>> such that sub is contained within S[start:end]. Optional arguments
>> start and end are interpreted as in slice notation.
>>
>> Return -1 on failure.
>>
>> Now, the empty string is a substring of every string so how can find
>> fail?
>> find, from the doc, should be generally be equivalent to
>> S[start:end].find(substring) + start, except if the substring is not
>> found but since the empty string is a substring of the empty string it
>> should never fail.
>>
>> Looking at the source code for find(in stringlib/find.h):
>>
>> Py_LOCAL_INLINE(Py_ssize_t)
>> stringlib_find(const STRINGLIB_CHAR* str, Py_ssize_t str_len,
>> const STRINGLIB_CHAR* sub, Py_ssize_t sub_len,
>> Py_ssize_t offset)
>> {
>> Py_ssize_t pos;
>>
>> if (str_len < 0)
>> return -1;
>>
>> I believe it should be:
>>
>> if (str_len < 0)
>> return (sub_len == 0 ? 0 : -1);
>>
>> Is there any reason of having this unexpected behaviour or was this
>> simply overlooked?
>
> why would you be searching for an empty string?
> what result would you expect to get from such a search?
In general, if
needle in haystack[ start: ]
return True, then you' expect
haystack.find(needle, start)
to return the smallest i >= start such that
haystack[i:i+len(needle)] == needle
also returns True.
>>> "" in "spam"[5:]
True
>>> "spam"[5:5+len("")] == ""
True
>>>
So, you'd expect that spam.find("", 5) would return 5.
The only other consistent position would be that "spam"[5:]
should raise an IndexError, because 5 is an invalid index.
For that matter, I wouldn;t mind if "spam".find(s, 5) were
to raise an IndexError. But if slicing at position 5
proudces an empry string, then .find should be able to
find that empty string.
-- HansM
More information about the Python-list
mailing list