Why are there no ordered dictionaries?

Tue Nov 22 11:46:14 EST 2005

Christoph Zwerschke wrote:

> But of course, it will always be slower since it is constructed on top
> of the built-in dict. In end effect, you always have to maintain a
> sequence *plus* a dictionary, which will be always slower than a sheer
> dictionary. The ordered dictionary class just hides this uglyness of
> having to maintain a dictionary plus a sequence, so it's rather an issue
> of convenience in writing and reading programs than a performance issue.
>
> It may be different if the ordered dict would be implemented directly as
> an ordered hash table in C.

The problem with hashing is that one is storing data from a possibly
wildly varying range in a set of values with a limited range. That's
where the ordering problems come from in the first place. If one wants
to solve it once and for all one has to give up the speed advantage
that makes hashing so popular. I wonder if optimized C code using
bisect would be very much slower for small ranges?

The current set implementation uses dicts to implement sets, while sets
are a more basic data type than dicts. At least dicts and sets should
be derived from the same type. Just speaking from an intuitive point of
view of course :-)

Here's a sorted dict implementation without using hashes, which maybe
would be fast if implemented in C. The insertion order can be retrieved
using the keys and values lists from the object itself, items() gives a
sorted sequence.

Anton.

NB warning untested code.

When using mutables as keys which is possible by this implementation,
don't change the keys after they're used in the object.

from array import array

class VDict:

    def __init__(self,sequence = []):
        self.keys = []
        self.values = []
        self.unranks = array('L',[])
        for key,value in sequence:
            self[key] = value

    def __setitem__(self,key,value):
        keys = self.keys
        values = self.values
        unranks = self.unranks
        n = len(keys)
        i = self.bisect_left(key)
        if i == n or keys[unranks[i]] != key:
            keys.append(key)
            values.append(value)
            unranks.insert(i,n)
        else:
            values[i] = value

    def __getitem__(self,key):
        i = self.bisect_left(key)
        return self.values[self.unranks[i]]

    def bisect_left(self,key, lo=0, hi=None):
        keys = self.keys
        unranks = self.unranks
        if hi is None:
            hi = len(keys)
        while lo < hi:
            mid = (lo+hi)//2
            if keys[unranks[mid]] < key:
                lo = mid+1
            else:
                hi = mid
        return lo

    def __contains__(self,key):
        keys = self.keys
        unranks = self.unranks
        n = len(keys)
        i = self.bisect_left(key)
        return i < n and keys[unranks[i]] == key

    def __len__(self):
        return len(self.keys)

    def items(self):
        keys = self.keys
        values = self.values
        unranks = self.unranks
        return [(keys[i],values[i]) for i in unranks]

    def __iter__(self):
        return iter(self.items())

    def remove(self,key):
        keys = self.keys
        values = self.values
        unranks = self.unranks
        n = len(keys)
        i = self.bisect_left(key)
        x = unranks[i]
        if  i < n and keys[x] == key:
            del keys[x]
            del values[x]
            del unranks[i]
            for j,k in enumerate(unranks):
                if k > x:
                    unranks[j] -= 1

def test():
    V = VDict()
    L = [1,2,3]
    V[L] = 10
    print V[L]
    V[L] = 20
    print V[L]
    V.remove(L)
    print V.items()
    V = VDict(zip('edcba',range(5)))
    print V.items()
    print V['d']
    V.remove('d')
    print V.items()

if __name__=='__main__':
    test()