[issue27275] KeyError thrown by optimised collections.OrderedDict.popitem()

Josh Rosenberg report at bugs.python.org
Tue Oct 25 10:31:17 EDT 2016


Josh Rosenberg added the comment:

Explaining expiringdict's issue: It's two race conditions, with itself, not another thread.

Example scenario (narrow race window):

1. An entry has an expiry time of X (so it will self-delete at time X or later)
2. At time X-1, the PySequence_Contains check is run, and it returns 1 (true)
3. Because contains returned True, at time X PyObject_GetItem is run, but because it's time X, expiringdict's __getitem__ deletes the entry and raises KeyError

An alternative scenario with a *huge* race window is:

1. An entry has an expiry time of X (so it will self-delete at time X or later)
2. No lookups or membership tests or any other operations that implicitly clean the expiring dict occur for a while
3. At time X+1000, _odict_FIRST (in popitem) grabs the first entry in the OrderedDict without invoking the __contains__ machinery that would delete the entry
3. At time X+1000 or so, the PySequence_Contains check is run, and it returns 0 (false), because the __contains__ machinery is invoked, and again, because no default is provided for popitem, a KeyError is raised (this time by the popitem machinery, not __getitem__)

expiringdict is unusually good at bringing this on itself. The failing popitem call is in __setitem__ for limited length expiringdicts, self.popitem(last=False), where they're intentionally removing the oldest entry, when the oldest entry is the most likely to have expired (and since __len__ isn't overridden to expire old entries, it may have been expired for quite a while).

The del self[next(OrderedDict.__iter__(self))] works because they didn't override __iter__, so it's not expiring anything to get the first item, and therefore only __delitem__ is involved, not __contains__ or __getitem__ (note: This is also why the bug they reference has an issue with "OrderedDict mutated during iteration"; iteration returns long expired keys, but looking the expired keys up deletes them, causing the mutation issue).


Possible correct fixes:

1. Make popitem _more_ subclass friendly; when it's an OrderedDict subclass, instead of using _odict_LAST and _odict_FIRST, use (C equivalent) of `next(reversed(self))` and `next(iter(self))` respectively. This won't fix expiringdict as is (because it's broken by design; it doesn't override __iter__, so it will happily return long expired keys that disappear on access), but if we're going for subclass friendliness and allowing non-idempotent __contains__/__getitem__/__iter__ implementations, it's the right thing to do. If expiringdict implemented __iter__ to copy the keys, then loop over the copy, deleting expired values and yielding unexpired values, this would at least reduce the huge race window to a narrow window (because it could still yield a key that's almost expired)
2. Check for the presence of __missing__ on the type and only perform the PySequence_Contains check if __missing__ is defined (to avoid autovivification). This fixes the narrow race condition for subclasses without __missing__, but not the huge race condition
3. Explicitly document the idempotence assumptions made by OrderedDict (specifically, that all non-mutating methods of OrderedDict must not be made mutating in subclasses unless the caller also overrides all multistep operations, e.g. pop/popitem/setdefault).

TL;DR: expiringdict is doing terrible things, assuming the superclass will handle them even though the superclass has completely different assumptions, and therefore expiringdict has only itself to blame.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue27275>
_______________________________________


More information about the Python-bugs-list mailing list