[Python-Dev] PEP 218 (sets); moving set.py to Lib

Guido van Rossum guido@python.org
Tue, 20 Aug 2002 16:49:25 -0400


> > I am still perplexed that I receoved *no* feedback on the sets module
> > except on this issue of sort order (which I consider solved by adding
> > a method _repr() that takes an optional 'sorted' argument).
> 
> I haven't read the entire thread, but I was puzzled by the implementation 
> approach. Did you consider kjbuckets for the standard Python distribution? 

No.  I think that would be the wrong idea at this point for two
reasons: (1) never change two variables at the same time; (2) let's
gather some experience with the new set API first, before we start
worrying about implementation speed.

I also believe that kjbuckets maintains its data in a sorted order,
which is unnecessary for sets -- a hash table is much faster.  After
all we use a very fast hash table implementation to represent sets.
(The only improvement would be that we could save maybe 4 bytes per
hash table entry because we don't need a value pointer.)

> While the claim is rather old, the following quote from Aaron's
> intro [1] to the module suggests it might improve performance:
> 
>    For suitably large compute intensive uses these types should
>    provide up to an order of magnitude speedup versus an
>    implementation that uses analogous operations implemented
>    directly in Python.

The sets module does not implement analogous operations directly in
Python.  Almost all the implementation work is done by the dict
implementation.

> Adding the gadfly SQL database to the standard library would also be
> useful, but since it is back under development it would be best for
> gadfly to live on a separate release cycle. The kjbuckets software,
> however, doesn't seem to be changing.

Because nobody is maintaining it any more.

> One more reason for adding kjbuckets, Tim Berner-Lee might find the
> kjGraphs class useful for the semantic web work.
> 
> [1] http://starship.python.net/crew/aaron_watters/kjbuckets/kjbuckets.html

kjbuckets may be nice, but adding it to the core would add a serious
new maintenance burden for the core developers.  I don't see anyone
raising their hand to help out here.

--Guido van Rossum (home page: http://www.python.org/~guido/)