Py2.3: Feedback on Sets

Tue Aug 19 23:10:19 EDT 2003

On Tue, 12 Aug 2003 06:02:17 GMT, rumours say that "Raymond Hettinger"
<vze4rx4y at verizon.net> might have written:

[replying only to those that I have something substantial to say]

>* Is the support for sets of sets necessary for your work
>   and, if so, then is the implementation sufficiently
>   powerful?

I have used sets in:
- Unix sysadm tasks (comparing usernames between passwd and shadow,
finding common files in sync requests et al)
- a hangman game (when the computer guesses words, to continuously
restrict the possibilities based on the human input)
- an image recognition program (comparing haar coefficients)

These come to mind at the moment, but I have used them even in the
python command line; and mostly I care about intersections.

>* Does the performance meet your expectations?

In the game and image recognition programs I could use more power;-)

>* Are sets helpful in your daily work or does the need arise
>   only rarely?

I use them often, it's a very helpful construct.

>User feedback is essential to determining the future direction
>of sets (whether it will be implemented in C, change API,
>and/or be given supporting language syntax).

Reimplementation in C sounds appropriate, and supporting language syntax
would be nice.

A quick thought, in the spirit of C implementation: there are cases
where I would like to get the intersection of dicts (based on the keys),
without having to create sets from the dict keys and then getting the
relevant values.  That is, given dicts a and b, I'd like:

>>> a & b # imaginary

to mean

>>> dict([x, a[x] for x in sets.Set(a) & sets.Set(b)]) # real

You may notice that a&b wouldn't be equivalent to b&a.
Perhaps the speed difference would not be much; I'll grow a function in
dictobject.c, run some benchmarks and come back with results for you.

Another thought: it is unfortunate that an intersection *has* to be
through continuous lookups (talking about the ordering of dict keys re
their hash values, I'll have to delve into dictobject.c it seems), even
taking into account the great speed of key lookups... although building
the result dict should account for more processing cycles than the
comparisons; and in some cases doing a dict.copy() and then removing the
uncommon elements would be faster.  Hm, food for thought, and no more
than two hours to sleep now.

Another slogan: Python keeps your mind awake (and c.l.py keeps your body
away from bed :)
-- 
TZOTZIOY, I speak England very best,
Microsoft Security Alert: the Matrix began as open source.