help retrieving / storing info in dictionary or gadfly

Tim Peters tim.one at home.com
Mon Jan 15 06:28:40 EST 2001


[Bryan Webb]
> I have a dictionary that has records with keys from 1 to 2^128
> I can build a list and sort the keys in order, then search thru
> the list of sorted keys.
>
> If the list has 1,2,3,4,5,7,8,9 and I need to search for 6. 6 is
> not in the list , so I need the record that is closest to 6 but
> not greater than 6.

[Alex Martelli]
> This is close to what bisect (see 5.5 in the library reference) does
> for you, except it gives the index in the sorted list locating the
> key closest but not smaller, rather than closest but not greater --
> a number from 0 to len(list), both extremes _included_ -- len(list)
> if the key you're looking for is larger than the largest (last) entry
> in the sorted list.  You presumably want -1 to len(list)-1 [again
> extremes included], with -1 if the key is smaller than the smallest
> (first) entry.

Like most places sequence indices are used in Python, it's tedious to keep
straight what bisect does if the return value is viewed as pointing *at* an
element, but trivial to keep straight if viewed as pointing *between*
elements.  A search for 6 in the list should be viewed as returning a
pointer to the comma marked by the caret:

1,2,3,4,5,7,8,9
         ^

That is, bisect returns the index of the "hole" between the 5 and the 7
(which should make it obvious what a list.insert() does when fed that index
back!).  With that view in mind, it's clear that a return value of 0 points
"before the 1", and a return value of 8 "after the 9":  len(list)+1 possible
holes, and len(list)+1 possible return values.  Bryan would be very well
served to stick with that scheme!  Trying to make -1 mean "before the start"
when 0 already means that to rest of the language will just get him into
subtle trouble.

Bummer:  if the element you're looking for is already in the list, the
current behavior of bisect isn't fully specified.  It so happens that it
reliably returns the index of the hole immediately to the right of the
rightmost equal element.  In 2.1 this will be documented, and there are new
methods bisect_left, bisect_right, insort_left and insort_right that
guarantee what happens when finding at least one equal element.

> You can wrap bisect.bisect for your purposes, e.g.:
>
> def locate(list, key):
>     i = bisect.bisect(list,key)
>     try:
>         if list[i]==key:
>             return i
>     except IndexError:
>         pass
>     return i-1

Note that, because of the above, it's not possible for the list[i]==key test
to succeed; e.g.,

>>> import bisect
>>> bisect.bisect([42], 42)
1
>>>

Because 2.0 bisect acts like 2.1's bisect_right, the hole index returned is
*after* the rightmost equal key.  So the test needs to be list[i-1]==key
(or, in 2.1, you can use bisect_left instead, and in the example get back
0 -- that's more natural if you're trying to determine whether the element
is already there).

>> Is there a way to further index the list of keys to do this ?
>>
>> Could I use GADFLY to do this.

> I don't see how you could do better than O(log N) to locate the
> key 'closest but not greater', with any auxiliary data structure
> of reasonable size, given your range of possible keys.  And that
> is the performance behavior that bisect.bisect gives you.

Come now:  2^128 is only 130 <wink>.  All right, if he meant 2**128, I
agree.

speaking-python-in-pythonland-ly y'rs  - tim





More information about the Python-list mailing list