How to remove item from heap efficiently?

Chris Angelico rosuav at gmail.com
Wed Jan 13 11:54:32 EST 2016


On Thu, Jan 14, 2016 at 3:40 AM, Robin Becker <robin at reportlab.com> wrote:
> is this true? I looked at https://wiki.python.org/moin/TimeComplexity and it
> says that dict.get which I assume is used for accessing the heapq delete
> point can be large (the average time is O(1), but amortized over a lot of
> accesses can be O(n)). Apparently the history of sets/gets can affect
> individual times quite a lot. I seem to remember there was some kind of
> hashing attack against python dicts that would use up large amounts of time,
> but I guess that's now fixed.

Yes and no. In current CPython versions, the hash attack is made a lot
harder by randomizing the hashes of strings; the most obvious attack
(sending form data to a web site) is now a lot harder, because you
need to figure out what the server process is using as its key. But
there are some caveats to it:

1) CPython 2.7 doesn't (I believe) do this by default. You have to set
PYTHONHASHSEED=random or pass the -R parameter.
2) This still only affects strings. If you build a dictionary out of
integers provided by a potential attacker, the hashes will be
consistent.
3) If an attacker can get info from the server in dictionary order,
the hash key could be deduced. Since it can't change within one run of
a program, this could make attacks possible.

So, I wouldn't say it's *fixed*. It's certainly been made a lot less
easy to abuse than it originally was. That said, though - even before
these fixes, it wasn't something you'd accidentally stumble on. It
takes a deliberate attack.

ChrisA



More information about the Python-list mailing list