[Python-ideas] Contiguous-array-based ordering for OrderedDict

Tue Dec 15 04:09:55 EST 2015

On Dec 14, 2015 1:11 PM, "Sven R. Kunze" <srkunze at mail.de> wrote:
>
> Despite all the nice discussion about runtime and so on and so forth.
>
> Couldn't he just provide a working implementation to prove his point? Maybe, he finds out himself that it's not working as intended and in the course of doing so, he finds an even better solution.

I did. It is in a pastebin link in my original message, based on the
3.5.0 Python (not C) version of OrderedDict. I was hoping for guidance
on evaluating it. Maybe it wasn't seen because I used '-'*n to
separate it from my intro, or maybe pastebin is so disliked that
people couldn't see it. Here it is again: http://pastebin.com/LESRktJw

Sorry I haven't been responding. I wanted to respond on a PC instead
of my phone because I don't see a plaintext option on the app, but I
get distracted when on the PC.

My implementation combines the ideas I listed. Raymond's idea seems
similar, but more clever with indexing and less focused on ordering.
Regarding PyPy's, I get the impression that CPython can make an
efficient implementation based on it. I think that's the real way to
go for an efficient OrderedDict implementation.

By the way, here are the costs for some of the algorithms, compared to
the raw Python OrderedDict:
- getitem: unchanged
- setitem: Amortized O(1). but for slightly different reasons (list
expansion in addition to expansion of two maps). If new key, it will
usually be better: no creation of a new object, and a lot less
constant work in bytecode if the list isn't expanded (incrementing and
appending instead of linking).
- delitem: Amortized O(1). A lot less bytecode overhead as above, but
there can be a compaction here.
- (iteration): Iterating over a 50% empty array may be better than
iterating over a linked list, especially if cache is taken into
account.
    - Python version: For a 50% empty array, you pay len(odict) extra
"if"s and "is"s, plus the cost of iterating over a (reversed, for
reverse iteration) list of length 2*len(odict). The list version has a
lot of extra dereferences instead.
    - Potential C version: It's just a simple for loop with a check of
pointer identity, versus a chain of dereferences. It should definitely
be better on the cache, maybe less good for branch prediction.
- pop: Same as delitem: possible compaction.
- move_to_end: Moving to the back is less constant work (though I do a
lot of checks, and the C version might not have the same benefits).
Moving to the front is awful: it shifts everything up. This can be
alleviated by using a deque, or using the list as a list-backed deque:
treat the array as circular. This would add overhead to everything
else, though, so I don't like it. (How do I find out how popular the
second parameter to move_to_end is?)

If using my second idea of storing inner_dict[key] = [key, value,
index] (where index is into the order array), there would be a lot
less overhead on many things, but more overhead on getitem. In C code,
it might be barely any overhead.

On Sat, Dec 12, 2015 at 6:06 AM, Andrew Barnert <abarnert at yahoo.com> wrote:

> There are lots of benchmark suites out there that use dicts; modifying them to use OrderedDicts should be pretty easy. Of course that isn't exactly a scientific test, but at least it's an easy place to start.

I, uh, have no professional experience in programming. So it's not as
easy for me to know how to start.

> Why would it be easier for the hardware? Allocation locality? If both the array and the linked list had to do similar amounts of work, sure, but again, the linked list only needs to touch two nodes and one hash entry, while the array needs to touch N/2 slots (and N/2 hash entries if you use the second dict), so improved locality isn't going to make up for that. (Plus, the hashes are still spread all over the place; they won't be any more localized just because their values happen to be contiguous.)

When iterating the array, here are the costs per element:
- used: Same as linked list, but without the linkedlist overhead.
- unused: Check if the entry is valid. This doesn't require dereferencing.

It's possible that the branch for checking validity is higher than the
linkedlist overhead, but that's the only way I can think of that the
array would be worse at iterating, even when sparse.

> And, more importantly, if there's C code that sees the OrderedDict as a dict and ignores the __getitem__ and goes right to the hash value, that would probably be a big optimization you'd be throwing away. (You could change the C APIs to handle two different kinds of dict storage, but then you're introducing a conditional which would slow things down for regular dicts.)

I'm pretty sure that all C code uses the C version of __getitem__ on
the dict, because it's NOT statically known which __getitem__ function
to call. There's a pointer in each dict to its __getitem__ function.

On Sat, Dec 12, 2015 at 7:34 AM, Serhiy Storchaka <storchaka at gmail.com> wrote:
> The performance of Python implementation doesn't matter much now (while it
> has the same computational complexity).

It might matter a little, for alternative Python implementations which
haven't yet made their own optimized ODicts.

On Mon, Dec 14, 2015 at 12:30 PM, Andrew Barnert via Python-ideas
<python-ideas at python.org> wrote:
> One thing that might work is a linked list of arrays (and start offsets): to delete from the middle of a node, you just split it in two at that point, so your cost is only the cost of copying half a node rather than half the structure. This obviously has the same worst-case behavior as a linked list, but in most uses the constant multiplier is closer to that of an array (especially if tuned so the nodes often fill a cache line, memory page, disk block, or whatever else is most important for the specific use), which is why it's used for C++ std::deque, many filesystem structures, database blobs, etc.

I think it's much better to just live with a half-empty array. But
this would have to be proven with code and benchmarks.

As an aside, have you seen Stroustrup and Sutter talking about linked
lists versus vectors?
- Stroustrup: https://www.youtube.com/watch?v=YQs6IC-vgmo
- Sutter: https://channel9.msdn.com/Events/Build/2014/2-661

Here's an example:
- Vector: Linear search for an element, then remove it by shifting
everything down.
- List: Linear search for an element, then remove it by delinking.

Vector will beat list for even very large sizes, simply because the
dereferencing in the search will be worse than the shifting (which
would take advantage of prefetching, branch prediction, etc.).

For iteration of an OrderedDict based on an array, there won't be a
branch prediction benefit, since the holes will be distributed solely
based on how it's used. There might not even be a prefetching benefit,
usually, because of how many objects might be involved during the
iteration (that is, more memory is touched than with equivalent C++
code). But that's what testing is for, I guess.