[Python-Dev] Changing the order of iteration over a dictionary

Mark Shannon mark at hotpy.org
Fri Jan 20 11:49:05 CET 2012


Hi,

One of the main sticking points over possible fixes for the 
hash-collision security issue seems to be a fear that changing the 
iteration order of a dictionary will break backwards compatibility.

The order of iteration has never been specified. In fact not only is it 
arbitrary, it cannot be determined from the contents of a dict alone; it 
may depend on the insertion order.

Changing a hash function is not the only change that will change the 
iteration order; any of the following will also do so:
* Changing the minimum size of a dict.
* Changing the load factor of a dict.
* Changing the resizing policy of a dict.
* Sharing of keys between dicts.

By treating iteration order as part of the API we are effectively ruling 
out ever making any improvements to the dict.

For example, my new dictionary implementation
https://bitbucket.org/markshannon/hotpy_new_dict/
reduces memory use by 47% for gcbench, and by about 20% for the 2to3 
benchmark, on my 32bit machine.
(Nice graphs: http://tinyurl.com/7qd2nnm http://tinyurl.com/6uqvl2x )

The new dict implementation (necessarily) changes the iteration order 
and will break code that relies on it.

If dict iteration order is to be treated as part of the API (and I think 
that is a very bad idea) then it should be documented, which will be 
difficult since it is barely deterministic.
This will also be a major problem for PyPy, Jython and IronPython, as 
they will have to reimplement their dicts.

So, don't be afraid to change that hash function :)

Cheers,
Mark


More information about the Python-Dev mailing list