How to remove item from heap efficiently?

srinivas devaki mr.eightnoteight at gmail.com
Fri Jan 8 14:26:10 EST 2016


So you need a task scheduler with expiration and priority on each task.
Interesting Problem..

as peter said about marking the task in heapB to be deleted. but this
needs searching everytime you pop off an old item [complexity:
O(len(heapB))]. you may as well use python del on it as complexity is
same.
But if you already know the where to look in the heapB then you can
avoid searching and thereby reducing the complexity. you can do this
by saving the references of heapB in heapA and viceversa

and if you marking delete on a number of low priority tasks, then it
can increase your heapB enormously because being low priority items
they can stagnate. to resolve this error you have to perform linear
checking iterations at every critical length (this critical length can
be obtained mathmatically), so that your amortized complexity of push,
pop remains log(number_of_valid_tasks_at_any_time);
the number of valid tasks at any time are those tasks which are not
expired at the time.

My Algorithm:
version_a: https://gist.github.com/9ce7a0e534c6e768239e
this version has simple scheduler which has methods for popping high
priority one and popping oldest task.
But this version has a disadvantage, i.e If lets say you are pushed
some 10**5 tasks with priority 2, and then pushed some 10**5 tasks
with priority 1. and then you decided to pop the oldest 10**5 items.
in this version that 10**5 elements will stagnate in priority heap if
in future all priorities are less than 1.
now this is not a big issue but if you are running a webserver and
over a span of 5 days there could be 10**10 tasks, and even if half of
those tasks stagnate your server is going to crash with out of memory.

version_b: https://gist.github.com/99b4d590753ba234eeed
this version resolved that stagnation. this one will run sweeps
whenever there are more than half of useless items, thereby giving us
an amortized complexity of O(log(n)) for push, pop, etc

but this version doesn't have the feature of expiration.

version_c: https://gist.github.com/9dfd0d291672c0ffa5c3
in this one we simply keep a variable expiration, and relieve the
expired tasks on any operation. i have coded it in such a way that
expiration is specific time, you can change it to delta time if you
want to.

Time Complexity: O(log(n)) insertion, O(log(n)) deletion   [amortized]
Space Complexity: O(n)  [amortized]
here n is number of valid items i.e which are not expired.

I hope this helps with your problem

PS:
sorry for posting links, it's just that the code is large for email.
I'm using minimum number has highest priority convention.


On Fri, Jan 8, 2016 at 10:15 PM, Sven R. Kunze <srkunze at mail.de> wrote:
> Thanks for your suggestion.
>
> On 08.01.2016 14:21, srinivas devaki wrote:
>>
>> You can create a single heap with primary key as timestamp and
>> secondary key as priority, i.e by creating a tuple
>> insert the elements into the heap as
>> (timestamp, priority)
>
> I think I cannot use that because I need the list sorted by both criteria.
>>
>> If there is any underlying meaning for creating 2 heaps. please mention.
>
>
> I use two heaps because I need to insert arbitrary items fast and remove the
> ones fast which are too old (timestamp) or are next in order (priority).
>
> Basically a task scheduler where tasks can be thrown away once they are too
> long in the queue.
>
>
>> On Fri, Jan 8, 2016 at 4:22 AM, Sven R. Kunze <srkunze at mail.de> wrote:
>>>
>>> Hi everybody,
>>>
>>> suppose, I need items sorted by two criteria (say timestamp and
>>> priority).
>>> For that purpose, I use two heaps (heapq module):
>>>
>>> heapA # items sorted by timestamp
>>> heapB # items sorted by priority
>>>
>>> Now my actual problem. When popping an item of heapA (that's the oldest
>>> item), I need to remove the very same item from heapB, regardlessly where
>>> it
>>> is in heapB. And vice versa.
>>>
>>> Is there a datastructure or a simple trick to achieve that in an
>>> efficient
>>> matter?
>>>
>>> Best,
>>> Sven
>>> --
>>> https://mail.python.org/mailman/listinfo/python-list
>
>



More information about the Python-list mailing list