[Python-Dev] PySet API

Barry Warsaw barry at python.org
Tue Mar 28 05:25:46 CEST 2006


On Sat, 2006-03-25 at 22:05 -0500, Raymond Hettinger wrote:

> Still, PyObject_Clear(s) would be better.  

Although not ideal, in the interest of compromise, I could support this
option.  There's a problem with this though: I don't think you want to
be able to clear a frozen set.  My PySet_Clear() raises a SystemError
and returns -1 when the object is a frozen set.

If PyObject_Clear() is implemented something like

int PyObject_Clear(PyObject *o)
{
    return (o->ob_type->tp_clear ? o->ob_type->tp_clear(o) : -1);
}

then you /would/ be able to clear a frozen set.  For that matter, it
would be the case that any immutable collection would be clearable if it
had a tp_clear (which it probably would).  That isn't the semantics I'd
expect though.  That may not be solvable unless you make
PyObject_Clear() an alias for PyObject_CallMethod("clear").

Although I'm sure you'll disagree, I think this is less than ideal.  For
one thing, you're requiring objects that work with PyObject_Clear() to
implement an exact Python-level protocol (it must have a method, it must
be called "clear" and it must take zero arguments).  You also have to
implement PyObject_Clear() with a hasattr test, because I don't think
you want PyObject_Clear() raising AttributeErrors.  That raises the
constant overhead cost, which can make clearing small sets more
expensive.

> Better still would be to examine the 
> actual uses in the app.  I suspect that most code that clears a set and then 
> rebuilds it would be better-off starting with a new empty set (and because of 
> freelisting, that is a very fast operation).

That may not be possible.  Imagine a complex application where the set
is passed through many layers of calls.  The set hangs off of other
application level objects which you don't have access to at the point
where you're deciding whether to clear the set or not.  You can't create
a new set because you have no way to pass the new set back to the
several application level objects that would need to get their pointers
updated.  So the most obvious, simple approach is to just clear the set
you have right there.

> Likewise, it only takes a one-line header to define BarrySet_Update(s).  I do 
> not want that part of the C API exposed yet.  It is still under development and 
> may eventually become a function with a variable length argument list.

Really?  That would be odd and not at all parallel with established
convention (e.g. PyDict_Update()).  I would think that a vararg update
should be named something different in order to preserve the principle
of least surprise.

> If you're dead-set against using the iterator API, then maybe there is something 
> wrong with the API.  You should probably start a new thread on why you detest 
> the iterator API and see if there are ways to improve it.

I'm not saying there's anything wrong with the iterator API, I'm saying
that it's not always appropriate.  It's the nail/hammer argument.  But I
ran out of clever when I tried to propose the simplest, most direct fix
for our most pressing issues, so I'm not going to take the bait.

> > You talk about duck typing, but I don't care about that here.
> 
> It's one of the virtues of Python that gets reflected in the abstract API.  IMO, 
> it's nice that PyObject_Dir(o) corresponds to "dir(o)" and the same for hash(o), 
> repr(o), etc.  I just hope that by hardwiring data types in stone, that your app 
> doesn't become rigid and impossible to change.  I certainly do not recommend 
> that other people adopt this coding style (avoidance of iterators, duplication 
> of abstact api functions in concrete form, etc.)  If you're experiencing 
> debugging pain, it may be that avoidance of abstraction is the root cause.

Trust me Raymond, it's not the cause.  I keep trying to explain this but
I must be completely inept because you're just not getting it.  Let me
try this way: we're using Python's collection types (sets, lists, dicts)
as our fundamental collection data structures internally in our
application.  There's no duck typing going on.  There's no need for
abstraction because we know exactly what we have and there's no chance
we'll have something that smells like a set that isn't exactly a PySet.
As I've said many times, I'm all for an abstract API because it's darn
useful in many applications.  It's the lack of a concrete API that is
limiting.

> > I wouldn't object to that, but it wouldn't change my mind about
> > PySet_Clear().
> 
> This is plain evidence that something is wrong with your approach.  While 
> possibly necessary in your environment, the rest of mankind should not have to 
> stomach this kind of API clutter. 

Please, that's a bit extreme.  I haven't heard anybody scream about the
PyDict's API clutter and I don't see my PySet proposal as being any
different.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 309 bytes
Desc: This is a digitally signed message part
Url : http://mail.python.org/pipermail/python-dev/attachments/20060327/9fa20e5f/attachment.pgp 


More information about the Python-Dev mailing list