dict.reserve and other tricks

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Thu Nov 16 19:44:54 EST 2006


I have started doing practice creating C extensions for CPython, so
here are two ideas I have had, possibly useless.

If you keep adding elements to a CPython dict/set, it periodically
rebuilds itself. So maybe dict.reserve(n) and a set.reserve(n) methods
may help, reserving enough (empty) memory for about n *distinct* keys
the programmer wants to add to the dict/set in a short future. I have
seen that the the C API of the dicts doesn't allow this, and I don't
know if this can be implemented modifying the dicts a bit. Do you think
this may be useful?

-------------------------------

Sometime I need to remove some elements from dicts/sets. If I know a
rule that tells what elements have to be removed I can use code like
this:

import itertools

def filterset(predicate, inset):
  return set(itertools.ifilter(predicate, inset))

print filterset(lambda x: x & 1, set(range(10)))

Output:
set([1, 3, 9, 5, 7])

And a similar one to filter dicts that works with a predicate(key,
val).
Most of the times such functions are good enough, but sometimes the
dicts are big, so to reduce memory used I remove keys in place:

def filterdict(pred, indict):
  todel = [k for k,v in indict.iteritems() if not pred(k,v)]
  for key in todel:
    del indict[key]

d = dict.fromkeys(xrange(8), 0)
print d
filterdict(lambda k,v: k & 1, d)
print d

# Output:
# {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0}
# {1: 0, 3: 0, 5: 0, 7: 0}


But doing the same thing while iterating on the dict may be faster and
use even less memory.
This iteration&deletion capability is probably not safe enough to be
used inside Python programs, but a compiled C module with a function
that works like that filterdict (and deletes while iterating) may be
created, and its use is safe from Python programs.

The C++ STL API of maps allows to delete items while iterating on the
map (but probably it has to be done only when necessary, because it may
be a source for bugs):

typedef map<int, string> isMap;
typedef isMap::value_type isValType;
typedef isMap::iterator isMapItor;

void main() {
  isMap m;

  ... // items added to m

  for(isMapItor itr = m.begin(); itr != m.end();) {
    if(itr->second == "somestring")
      m.erase(itr++);
    else
      ++itr;
  }


The Python/C API, 7.4.1 Dictionary Objects, says that's not possible
with CPython dicts:

>The dictionary p should not be mutated during iteration. It is safe (since Python 2.1) to modify the values of the keys as you iterate over the dictionary, but only so long as the set of keys does not change.<

Do you think it may be useful to create to create such C filterdict
function that deletes while iterating? (To create such function it
probably has to bypass the CPython dict API).

Bye,
bearophile




More information about the Python-list mailing list