[Python-ideas] An identity dict

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Wed Jun 2 00:05:31 CEST 2010


2010/6/1 Benjamin Peterson <benjamin at python.org>:

> My contention is that an identity dictionary or at least a dictionary with
> custom hash and keys is a useful primitive that should be in the standard
> library. However, I also see its advantage in avoiding bad performance of id()
> based identity dicts in non-CPython implementations.
>
> It is useful to let the implementation optimize it any time there is moving GC
> as in Jython and IronPython where id also is expensive. (Basically a mapping has
> to be maintained for all objects on which id is called.)

Here is how I designed this for my language:

You can request the ObjectId of the given object. If an ObjectId
corresponding to the given object is still alive, you always get it
back again, but it can be GC'ed and later created afresh. ObjectIds
are hashable and comparable (with an arbitrary ordering). Hash values
and the ordering are preserved when ObjectIds are kept alive, but they
may be different if ObjectIds are created afresh. An ObjectId contains
an integer index which is unique among ObjectIds being alive at the
same time.

You can make a dictionary with a specified key function. It is
internally backed by something equivalent to f(k) -> (k, v) dict. A
dictionary with ObjectId constructor as the key is an identity
dictionary; it works because it keeps both k and f(k) alive.

An advantage of this scheme is that with a moving GC the id mapping
must be maintained only for objects for which the program keeps their
ObjectIds alive.

A disadvantage is that the program must be careful to not use
ObjectIds in a manner which does not keep them alive yet expects
consistent hashing and ordering. In particular a key-mapped dict which
would store only (k, v) pairs and compute f(k) on the fly would not
work. Also ObjectIds cannot be used to generate printable unique
identifiers which would be valid without having to keep ObjectIds
alive, like in Python's default repr.

-- 
Marcin Kowalczyk



More information about the Python-ideas mailing list