Py2.3: Feedback on Sets - diffudict.txt (0/1)

Wed Aug 20 09:46:19 EDT 2003

On Wed, 20 Aug 2003 06:10:19 +0300, rumours say that Christos "TZOTZIOY"
Georgiou <tzot at sil-tec.gr> might have written:

>A quick thought, in the spirit of C implementation: there are cases
>where I would like to get the intersection of dicts (based on the keys),
>without having to create sets from the dict keys and then getting the
>relevant values.  That is, given dicts a and b, I'd like:
>
>>>> a & b # imaginary
>
>to mean
>
>>>> dict([x, a[x] for x in sets.Set(a) & sets.Set(b)]) # real
>
>You may notice that a&b wouldn't be equivalent to b&a.
>Perhaps the speed difference would not be much; I'll grow a function in
>dictobject.c, run some benchmarks and come back with results for you.

I implemented dict.intersect(), and it is *quite* faster than the
equivalent Python code.

**********************************************************************

Python 2.4a0 (#3, Aug 20 2003, 16:31:22)
[GCC 3.2 (Mandrake Linux 9.0 3.2-1mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> help(dict.intersect)
Help on method_descriptor:

intersect(...)
    D.intersect(E) -> a subset of D having common keys with E

>>> import sets
>>> odds = dict(zip("abcdefghijklmn", range(1, 55, 2)))
>>> evens= dict(zip("asdfghj", range(2, 55, 2)))
>>>
>>> odds
{'a': 1, 'c': 5, 'b': 3, 'e': 9, 'd': 7, 'g': 13, 'f': 11, 'i': 17, 'h':
15, 'k': 21, 'j': 19, 'm': 25, 'l': 23, 'n': 27}
>>> evens
{'a': 2, 'd': 6, 'g': 10, 'f': 8, 'h': 12, 'j': 14, 's': 4}
>>>
>>>
>>> dict([(k, odds[k]) for k in sets.Set(odds) & sets.Set(evens)])
{'a': 1, 'd': 7, 'g': 13, 'f': 11, 'h': 15, 'j': 19}
>>> odds.intersect(evens)
{'a': 1, 'h': 15, 'j': 19, 'd': 7, 'g': 13, 'f': 11}
>>> dict([(k, evens[k]) for k in sets.Set(odds) & sets.Set(evens)])
{'a': 2, 'd': 6, 'g': 10, 'f': 8, 'h': 12, 'j': 14}
>>> evens.intersect(odds)
{'a': 2, 'h': 12, 'j': 14, 'd': 6, 'g': 10, 'f': 8}
>>>
>>>
>>> my_setup= 'import sets; odds=dict(zip("abcdefghijklmn", range(1, 55, 2))); evens=dict(zip("asdfghj", range(2, 55, 2)))'
>>> from timeit import Timer
>>>
>>> Timer(stmt="odds.intersect(evens)", setup=my_setup).repeat()
[1.3545670509338379, 1.3367550373077393, 1.3366960287094116]
>>> Timer(stmt="evens.intersect(odds)", setup=my_setup).repeat()
[1.321492075920105, 1.2869999408721924, 1.320341944694519]
>>> Timer(stmt="dict([(k, odds[k]) for k in sets.Set(odds) & sets.Set(evens)])", setup=my_setup).repeat()
[63.413245916366577, 63.526772975921631, 63.503224968910217]
>>> Timer(stmt="dict([(k, evens[k]) for k in sets.Set(odds) & sets.Set(evens)])", setup=my_setup).repeat()
[63.498296976089478, 63.49311900138855, 63.425426959991455]

**********************************************************************

A substantial difference, over 50x on an Athlon XP 1700.  Also note the
difference in the key order of the results.

I believe that dicts should grow such a method, perhaps with another
name.

Attached is the diff -u for dictobject.c compared to the one in last
night's python-latest.tgz
-- 
TZOTZIOY, I speak England very best,
Microsoft Security Alert: the Matrix began as open source.