How to remove item from heap efficiently?

Cem Karan cfkaran2 at gmail.com
Thu Jan 14 07:04:17 EST 2016


On Jan 13, 2016, at 2:08 PM, Sven R. Kunze <srkunze at mail.de> wrote:
> On 13.01.2016 12:20, Cem Karan wrote:
>> On Jan 12, 2016, at 11:18 AM, "Sven R. Kunze" <srkunze at mail.de> wrote:
>> 
>>> Thanks for replying here. I've come across these types of wrappers/re-implementations of heapq as well when researching this issue. :)
>>> 
>>> Unfortunately, they don't solve the underlying issue at hand which is: "remove item from heap with unknown index" and be efficient at it (by not using _heapq C implementation).
>>> 
>>> 
>>> So, I thought I did another wrapper. ;) It at least uses _heapq (if available otherwise heapq) and lets you remove items without violating the invariant in O(log n). I am going to make that open-source on pypi and see what people think of it.
>> Is that so?  I'll be honest, I never tested its asymptotic performance, I just assumed that he had a dict coupled with a heap somehow, but I never looked into the code.
> 
> My concern about that specific package is a missing C-implementation. I feel that somewhat defeats the whole purpose of using a heap: performance.

I agree with you that performance is less than that of using a C extension module, but there are other costs associated with developing a C module:

1) As the developer of the module, you must be very careful to ensure your code is portable.

2) Distribution becomes somewhat more difficult; you may need to distribute both source and compiled binaries for various platforms.  This is somewhat more annoying than pure python scripts.

3) Debugging can become significantly more difficult.  My current codebase is python+cython+c, and when something crashes, it is usually easier to use a bunch of printf() statements to figure out what is going on than to use a debugger (others may have different experiences, this is just mine).

4) Not everyone is familiar with C, so writing extensions may be more difficult.

5) Will the extension module work on non-cpython platforms (iron python, jython, etc.)?

Finally, without profiling the complete package it may be difficult to tell what impact your C module will have on overall performance.  In my code, HeapDict had less than a 2% performance impact on what I was doing; even if I had replaced it with a pure C implementation, my code would not have run much faster.

So, while I agree in principle to what you're saying, in practice there may be other factors to consider before rejecting the pure python approach.

> Asymptotic performance is still O(log n).

So, if the intent is to pop events more often than to peek at them, then in practice, HeapDict is about the same as some clever heap+dict method (which it might be, as I said, I haven't looked at the code).

>> That said, IMHO using a dict interface is the way to go for priority queues; it really simplified my code using it!  This is my not-so-subtle way of asking you to adopt the MutableMapping interface for your wrapper ;)
> 
> Could you elaborate on this? What simplified you code so much?
> 
> I have been using heaps for priority queues as well but haven't missed the dict interface so far. Maybe, my use-case is different.


I'm writing an event-based simulator, and as it turns out, it is much easier to tentatively add events than it is to figure out precisely which events will occur in the future.  That means that on a regular basis I need to delete events as I determine that they are garbage.  HeapDict did a good job of that for me (for completely unrelated reasons I decided to switch to a pure-C codebase, with python hooks to twiddle the simulator at a few, very rare, points in time; hence the python+cython+c comment above).

Thanks,
Cem Karan


More information about the Python-list mailing list