How to remove item from heap efficiently?

Sven R. Kunze srkunze at mail.de
Sun Jan 10 13:48:53 EST 2016


Wow. That's an impressive reply.

On 08.01.2016 20:26, srinivas devaki wrote:
> So you need a task scheduler with expiration and priority on each task.
> Interesting Problem..
>
> as peter said about marking the task in heapB to be deleted. but this
> needs searching everytime you pop off an old item [complexity:
> O(len(heapB))]. you may as well use python del on it as complexity is
> same.
> But if you already know the where to look in the heapB then you can
> avoid searching and thereby reducing the complexity. you can do this
> by saving the references of heapB in heapA and viceversa
>
> and if you marking delete on a number of low priority tasks, then it
> can increase your heapB enormously because being low priority items
> they can stagnate. to resolve this error you have to perform linear
> checking iterations at every critical length (this critical length can
> be obtained mathmatically), so that your amortized complexity of push,
> pop remains log(number_of_valid_tasks_at_any_time);
> the number of valid tasks at any time are those tasks which are not
> expired at the time.
>
> My Algorithm:
> version_a: https://gist.github.com/9ce7a0e534c6e768239e
> this version has simple scheduler which has methods for popping high
> priority one and popping oldest task.
> But this version has a disadvantage, i.e If lets say you are pushed
> some 10**5 tasks with priority 2, and then pushed some 10**5 tasks
> with priority 1. and then you decided to pop the oldest 10**5 items.
> in this version that 10**5 elements will stagnate in priority heap if
> in future all priorities are less than 1.
> now this is not a big issue but if you are running a webserver and
> over a span of 5 days there could be 10**10 tasks, and even if half of
> those tasks stagnate your server is going to crash with out of memory.
>
> version_b: https://gist.github.com/99b4d590753ba234eeed
> this version resolved that stagnation. this one will run sweeps
> whenever there are more than half of useless items, thereby giving us
> an amortized complexity of O(log(n)) for push, pop, etc
>
> but this version doesn't have the feature of expiration.
>
> version_c: https://gist.github.com/9dfd0d291672c0ffa5c3
> in this one we simply keep a variable expiration, and relieve the
> expired tasks on any operation. i have coded it in such a way that
> expiration is specific time, you can change it to delta time if you
> want to.
>
> Time Complexity: O(log(n)) insertion, O(log(n)) deletion   [amortized]
> Space Complexity: O(n)  [amortized]
> here n is number of valid items i.e which are not expired.
>
> I hope this helps with your problem

Indeed. I already do the sweep method as you suggested. ;)

Additionally, you provided me with a reasonable condition when to do the 
sweep in order to achieve O(log n). Thanks much for that. I currently 
used a time-bases approached (sweep each 20 iterations).

PS: Could you add a note on how you got to the condition ( 
2*self.useless_b > len(self.heap_b))?

> PS:
> sorry for posting links, it's just that the code is large for email.
> I'm using minimum number has highest priority convention.

I like Web technology, so no problem here. :)

> On Fri, Jan 8, 2016 at 10:15 PM, Sven R. Kunze <srkunze at mail.de> wrote:
>> Thanks for your suggestion.
>>
>> On 08.01.2016 14:21, srinivas devaki wrote:
>>> You can create a single heap with primary key as timestamp and
>>> secondary key as priority, i.e by creating a tuple
>>> insert the elements into the heap as
>>> (timestamp, priority)
>> I think I cannot use that because I need the list sorted by both criteria.
>>> If there is any underlying meaning for creating 2 heaps. please mention.
>>
>> I use two heaps because I need to insert arbitrary items fast and remove the
>> ones fast which are too old (timestamp) or are next in order (priority).
>>
>> Basically a task scheduler where tasks can be thrown away once they are too
>> long in the queue.
>>
>>
>>> On Fri, Jan 8, 2016 at 4:22 AM, Sven R. Kunze <srkunze at mail.de> wrote:
>>>> Hi everybody,
>>>>
>>>> suppose, I need items sorted by two criteria (say timestamp and
>>>> priority).
>>>> For that purpose, I use two heaps (heapq module):
>>>>
>>>> heapA # items sorted by timestamp
>>>> heapB # items sorted by priority
>>>>
>>>> Now my actual problem. When popping an item of heapA (that's the oldest
>>>> item), I need to remove the very same item from heapB, regardlessly where
>>>> it
>>>> is in heapB. And vice versa.
>>>>
>>>> Is there a datastructure or a simple trick to achieve that in an
>>>> efficient
>>>> matter?
>>>>
>>>> Best,
>>>> Sven
>>>> --
>>>> https://mail.python.org/mailman/listinfo/python-list
>>




More information about the Python-list mailing list