[Python-Dev] PySet API

Raymond Hettinger raymond.hettinger at verizon.net
Mon Mar 20 09:44:02 CET 2006


[Raymond]
>> What are you proposing to add to the PySet API?

[Barry]
> PySet_Clear(), PySet_Next(), PySet_Update(), and PySet_AsList().

PySet_Clear()
-------------
Use PyObject_CallMethod(s, "clear", NULL).

Or if you need to save a millisecond on an O(n) operation, use 
PyNumber_InPlaceSubtract(s,s) as shown in the docs.  If the name bugs you, it 
only takes a one-line macro to define a wrapper.  The set API should not be 
cluttered with unnecessary and redundant functions.


PySet_Next()
------------
This is also redundant.  The preferred way to iterate over a set should be 
PyObject_GetIter(s).  The iter api is generic and works for all containers.  It 
ought to be the one-way-to-do-it.

Further, it doesn't make sense to model this after the dictionary API where the 
next function is needed to avoid double lookups by returning pointers to both 
the key and value fields at the same time (allowing for modification of the 
value field).  In contrast, for sets, there is no value field to look-up or 
mutate (the key should not be touched).  So, we shouldn't be passing around 
pointers to the internal structure. I want to keep the internal structure of 
sets much more private than they were for dictionaries -- all access should be 
through the provided C API functions -- that keeps the implementation flexible 
and allows for future improvements without worrying that we've broken code for 
someone who has touched the internal structure directly.

Also, the _Next() api is not as safe as the _GetIter api which checks for 
mutation during iteration.  The safety should not be tossed aside without good 
reason.


PySet_Update()
---------------
Use PyObject_CallMethod(s, "update", "O", iterable).  That is the preferred way 
to access all of the high volume methods.  Only the fine grained methods (like 
contains, add, pop, or discard) have a need for a direct call.  Adding 
unnecessary functions for the many-at-once methods gains you nothing -- perhaps 
saving a tiny O(1) look-up step in an O(n) operation.

FWIW, the same reasoning also applies to why the list API defines 
PyList_Append() but not PyList_Extend().


PySet_AsList()
---------------
There is already a function expressly for this purpose, PySequence_List(s).  It 
is clear, readable, and is the one-way-to-do-it for turning arbitrary iterables 
into a list.  FWIW, that function already has an optimization to pre-size the 
result list to the correct size, so it runs pretty fast (no over-allocate / 
resize dance).

I had considered putting a further optimization inside PySequence_List to have a 
special case path for sets (like it does for tuples); however, it occurred to me 
that this can never be the time critical part of a program since the time to 
convert a set to a list is small compared to the time to construct the set in 
the first place (many times longer).  IOW, further micro-optimization here is 
pointless.




[Raymond]
>> IOW, if I have I still have a say in the matter, the patch will most likely 
>> not
>> be accepted.

[Barry]
> Can you explain why the additions above would not be obvious improvements?

Yes.  A fatter api is not a better api.  The PyObject_CallMethod() approach is 
highly readable and assures direct correspondence with a Python set method of 
the same name.  Trying to save the method name lookup time on O(n) methods is a 
false optimization.  The concrete API should avoid duplicating services provided 
by the abstract API such as PySequence_List().  Also, the set API should not 
model parts of the dict API that grant direct access to the internal structure 
or were needed because dictionaries had a value field.

As it stands now, it is possible to use sets in C programs and access them in a 
way that has a direct correspondence to pure Python code -- using 
PyObject_CallMethod() for things we would usually access by name, using the 
PyNumber API for things we would access using operators, using other parts of 
the abstract API exactly as we would in Python (PyObject_Repr, PyObject_GetIter, 
PySequence_List, PyObject_Print, etc.), and using a handful of direct access 
functions for the fine grained methods like (add, pop, contains, etc.).  IOW, I 
like the way the C code looks now and I like the minimal, yet complete API. 
Let's don't muck it up.

FWIW, the C implementation in Py2.5 already provides nice speed-ups for many 
operations.  Likewise, its memory requirements have been reduced by a third. 
Try to enjoy the improvements without gilding the lily.

Cheers,


Raymond 



More information about the Python-Dev mailing list