intersection, union, difference, symmetric difference for dictionaries

Oscar Benjamin oscar.j.benjamin at gmail.com
Tue Feb 25 17:58:48 EST 2014


On 25 February 2014 22:36, Tim Chase <python.list at tim.thechases.com> wrote:
> On 2014-02-25 22:21, Duncan Booth wrote:
>> > It would save some space if I didn't have to duplicate all the
>> > keys into sets (on the order of 10-100k small strings), instead
>> > being able to directly perform the set-ops on the dicts.  But
>> > otherwise, it was pretty readable & straight-forward.
>> >
>> It doesn't matter whether they were small strings or full-length
>> novels, creating a set from a dict doesn't duplicate any strings.
>
> pre-my-new-learning-about .viewkeys() it sounds like set(my_dict)
> would have the overhead of storing an additional reference to a
> string per set-entry (rather than duplicating every string).
> With .viewkeys()/.keys(), it sounds like that overhead would go away.

You lose the redundant hash-table by using the keys view. The set hash
table takes 20-40 bytes per entry on this computer (according to
sys.getsizeof). On the same system a short string takes a similar
amount of memory. The .keys() view object apparently takes 24 bytes in
total so yes it's a reasonable memory saving if the dicts or of any
significant size.

However the .keys() objects are not necessarily as well optimised so
it may be faster to convert to sets for some operations. One example I
found some time ago is that for sets A & B will iterate over the
smaller set but for dicts A.keys() & B.keys() would always go over the
left-hand set (or right-hand - I don't remember). This makes a
considerable difference if one set is significantly larger than the
other.


Oscar



More information about the Python-list mailing list