[Python-ideas] Contiguous-array-based ordering for OrderedDict

Serhiy Storchaka storchaka at gmail.com
Sat Dec 12 07:34:44 EST 2015


On 12.12.15 11:27, Franklin? Lee wrote:
> Here's Tim Peters's 2013 explanation of OrderedDict:
> http://stackoverflow.com/a/18951209/2963903
>
> He said:
>> But if [the order list] were a Python list, deleting a key would take O(n) time twice over: O(n) time to find the key in the list, and O(n) time to remove the key from the list.
>
> I didn't see why this would need to be the case.
> 1. If you start searching from the front, then each element would be
> inspected or shifted, never both. This would mean only one full
> iteration.
> 2. You don't have to shift for each deletion. You can wait until some
> threshold is reached before shifting, and maybe that will spread out
> the cost of shift among deletions enough to make an impact.
> 3. He then mentions that the linkedlist implementation uses a second
> dict to map keys to their list nodes. Well, that would solve the cost
> of lookup for lists, too.
>
> So I ended up taking 3.5.0's OrderedDict and implemented the idea. I'm
> not really sure what to do next, in terms of testing and comparing it
> against the existing implementation, and whether it's worth pushing
> for such a design for OrderedDict (current Python version or future C
> cersion). It's enough for me if I can figure out how this compares in
> performance, both algorithmically and actually, to the current
> implementation.

Did you try to replace standard OrderedDict implementation with your 
implementation and run tests? Be aware, some details of C implementation 
were changed, an new tests were added in 3.5.1. It is better to take 
3.5.1 for testing.

Both current implementations of OrderedDict tried to make OrderedDict 
threadsafe and don't left an OrderedDict in inconsistent state if 
hashing or comparison raise an exception. There is also an attempt to 
handle cases when an OrderedDict was changed passing over OrderedDict 
API (with direct calls of dict modifying methods: dict.__setitem__, 
dict.__delitem__, dict.update, etc).

The performance of Python implementation doesn't matter much now (while 
it has the same computational complexity). As for C implementation, I 
can't forecast what approach will be faster.




More information about the Python-ideas mailing list