frozendict (v0.1)

Jonas H. jonas at lophus.org
Fri Oct 8 10:04:55 EDT 2010


On 10/08/2010 03:27 PM, kj wrote:
> I tried to understand this by looking at the C source but I gave
> up after 10 fruitless minutes.  (This has been invariably the
> outcome of all my attempts at finding my way through the Python C
> source.)

It's not you. CPython's code is ... [censored]

Anyway, you don't even need to read C code to understand how sets are 
implemented.

There is a (now deprecated) Python module, Libs/set.py, that has 
implementations for `Set` and `ImmutableSet` (nowadays `set` and 
`frozenset`).

The implementation strategy you can see there is quite simple. The code 
uses dictionary keys to store the set items and "ignores" the dictionary 
values, so that `.add(value)` is implemented as `._dict[value] = 
some_value_nobody_cares_about`.

Here comes a very limited example set implementation using a dict:

class PrimitiveSet(object):
     def __init__(self):
         self._dict = {}

     def add(self, value):
         self._dict[value] = True

     def __contains__(self, value):
         return value in self._dict

     def __repr__(self):
         return 'PrimitiveSet(%r)' % self._dict.keys()

 >>> s = PrimitiveSet()
 >>> 'hello' in s
False
 >>> s.add('hello')
 >>> 'hello' in s
True
 >>> s
PrimitiveSet(['hello'])
 >>> s.add(tuple(xrange(10)))
 >>> s
PrimitiveSet([(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), 'hello'])
 >>> s.add(xrange(5))
 >>> s
PrimitiveSet([(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), xrange(5), 'hello'])

This has a few implications for sets:
* dict keys are unordered/sorted. so are sets.
* dict keys are unique. same for set values.
* dict keys have to be hashable (immutable). same for sets values.

So far our implementation is not hashable, and we need a custom 
implementation for __hash__ (because dicts aren't hashable, so we cannot 
re-use dictionary methods).
There is one requirement for set hashes: they have to be independent of 
the item order (there *is* an order in memory of course, and it may vary 
depending on the order assignments to our dict are done).

Here is an extract from the Python set implementation, 
`BaseSet._compute_hash`:

def _compute_hash(self):
     # Calculate hash code for a set by xor'ing the hash codes of
     # the elements.  This ensures that the hash code does not depend
     # on the order in which elements are added to the set. [...]
     result = 0
     for elt in self:
         result ^= hash(elt)
     return result

Hope this helps :-)

Jonas



More information about the Python-list mailing list