[Patches] [ python-Patches-1433928 ] add on_missing() and default_factory to dict
SourceForge.net
noreply at sourceforge.net
Wed Feb 22 02:34:25 CET 2006
Patches item #1433928, was opened at 2006-02-17 19:19
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1433928&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.5
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Guido van Rossum (gvanrossum)
Summary: add on_missing() and default_factory to dict
Initial Comment:
See the thread starting at
http://mail.python.org/pipermail/python-dev/2006-February/061261.html
This still needs unit tests and docs.
----------------------------------------------------------------------
>Comment By: Guido van Rossum (gvanrossum)
Date: 2006-02-21 20:34
Message:
Logged In: YES
user_id=6380
Here's a new version. This adds:
- fix for copying functions in copy.py (see python-dev post)
- more unit tests
- latex docs
- docstring tweaks
I haven't got latex installed on my laptop so I can't test
whether the docs are any good. Can anyone?
I did this on the plane so your proposal below isn't
incorporated yet. My comment is that I actually *like*
having the functionality in the dict base class since
someone else implementing on_missing() might not need the
default_factory instance variable; subclassing defaultdict
would be kind of wrong for that purpose.
I'm not sure I understand your comment about optimizing away
PyEval_CallMethod() -- don't we still need to use that so
that a subclass can override it? Or do you propose to do
another check for the exact defaultdict type and if so,
inline it or at least shortcut the call? I'm not sure I see
the value in such a micro-optimization at this point --
after all it's only called for missing keys which shouldn't
be all that frequent.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2006-02-21 16:03
Message:
Logged In: YES
user_id=80475
I just had one other thought. If the subclass implemented
__getitem__ and on_missing directly, then there would be
no need to make any changes to dictobject.c and we could
leave the mapping API alone (for a less disruptive
implementation):
class defaultdict(dict):
default_factory = None
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return self.on_missing(self, key)
def on_missing(self, key):
if self.default_factory is None:
raise KeyError(key)
value = self[key] = self.default_factory(key)
return value
Anyone needing to hook the on_missing method can do it
through a subclass of defaultdict rather than dict.
There are two side benefit to this approach. One, we keep
the changes all in one object making it easier to
understand than putting framework methods in the parent.
Two, this also allows the on_missing call to be done
directly instead of through PyEval_CallMethod(). That
will provide a nice speed-up for the common cases where
defaultdict hasn't been subclassed.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2006-02-21 11:24
Message:
Logged In: YES
user_id=6380
OK. Py_CLEAR() found; DictMixin.on_missing() added.
Still no docs.
Do UserDict and DictMixin need unit tests additions?
There are grumblings in python-dev on the name; but I still
like defaultdict best.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2006-02-21 10:41
Message:
Logged In: YES
user_id=80475
Py_CLEAR() is a macro defined in object.h. See example
uses in enumobject.c, genobject.c, and itertobject.c.
DictMixin should add the on_missing() method, leaving the
responsibility for calling it to core object's __getitem__
method. That will likely never happen, but it does allow
the mapping API to be emulated as fully as possible. Add
the method right at the very end (after __len__).
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2006-02-21 10:36
Message:
Logged In: YES
user_id=6380
Here's an update that adds:
- defaultdict.copy() implemented
- docstring tweaks
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2006-02-21 10:17
Message:
Logged In: YES
user_id=6380
Thanks both! Here's a new patch:
- fixed most of Neal's nits
- fixed most of Raymond's first set of nits
- updated UserDict
Notes:
- I can't use PyObject_CallMethodObjArgs without also
converting "on_missing" to an object.
Right now I don't care enough to bother (especially
since it is only called for proper subclasses).
- I can't find Py_Clear(). Raymond, what did you mean?
- It's not clear how (or even whether) to update DictMixin.
- Haven't gotten to copy() yet -- good point though!
- Still no docs.
- In addition to Raymond's explanation, it's also the case
that NULL appears as None to the user through the magic
of the T_OBJECT attribute machinery.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2006-02-21 09:29
Message:
Logged In: YES
user_id=80475
One other thought: The dict.copy method needs to be
overriden so that default dicts can duplicate themselves.
Also consider adding a __reduce__ method to support
deepcopying and pickling (the latter is less useful
because functions get pickled by name only).
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2006-02-21 08:27
Message:
Logged In: YES
user_id=80475
Neal, the defdict_on_missing check for Py_None is
necessary because the user can assign None to the factory
and expect it to have the same effect as NULL. The other
checks for NULL all automatically handle the Py_None case
the same as if the attribute were filled with a real
factory.
Code nits:
----------
Pre-parse:
PyEval_CallMethod((PyObject *)mp, "on_missing", "(O)",
key);
into:
PyObject_CallMethodObjArgs((PyObject *)mp, "on_missing",
key, NULL);
The first four lines in defdict_dealloc() can be
simplified to:
Py_Clear(dd->default_factory);
Likewise, the first five active lines of defdict_traverse
shrink to:
Py_VISIT(dd->default_factory);
The OFF() macro is used only once. Better to in-line it.
Other
-----
If you don't get a chance to add the docs, just apply the
patch and I'll write the docs later.
Likewise, I'll update UserDict and DictMixin to keep the
APIs in-sync.
----------------------------------------------------------------------
Comment By: Neal Norwitz (nnorwitz)
Date: 2006-02-21 02:49
Message:
Logged In: YES
user_id=33168
In test_print() how about os.unlink(tfn)? Why do you play
games with stdout, can't you do print >>f, ... ? (If you
don't need stdout, it looks like you don't need sys.)
In defdict_on_missing() why do you check if the factory is
None? It's the only place None is checked for, everywhere
else, it's just NULL.
defdict_repr should be static.
Type for this line in defdict_init() should be Py_ssize_t,
not int:
int n = PyTuple_GET_SIZE(args);
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2006-02-20 20:53
Message:
Logged In: YES
user_id=6380
Here's a completely new version after another round of
python-dev.
- The built-in dict type still defines and calls
on_missing(), but the default on_missing() implementation
just raises KeyError(key). It no longer has a
default_factory attribute.
- You can subclass dict and override on_missing() to do
whatever you want.
- A useful subclass is collections.defaultdict; it defines
an attribute default_factory and its on_missing()
implementation calls that and inserts the resulting value in
the dict (previous versions of the patch had this semantics
in the built-in dict class, which was frowned upon).
- Now with unit tests.
- No docs yet, though.
Assigning to Raymond Hettinger for review. Raymond, please
assign it back to me for checkin if you're okay with this
(or for revision if you're not :-). Because of Google's
lawyers I must check this in myself.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2006-02-19 18:45
Message:
Logged In: YES
user_id=6380
Here's a version that doesn't crash in debug mode.
Neal Norwitz is standing next to me and pointed out that I
was attempting to decref something after tp_free had already
wiped out the object. D'oh!
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2006-02-19 14:00
Message:
Logged In: YES
user_id=6380
Sorry, forgot the upload. Here it is.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2006-02-19 13:57
Message:
Logged In: YES
user_id=6380
Aha. I'll have to try that. In the mean time, here's a new
patch:
- PyDict_GetItem is no longer involved
- added {NULL} to PyMemberDef array
- set ma_default_factory to NULL in both constructors
Still no unit tests or docs.
I have some misgivings about the API -- perhaps this should
be a subclass.
----------------------------------------------------------------------
Comment By: Georg Brandl (birkenfeld)
Date: 2006-02-19 10:39
Message:
Logged In: YES
user_id=1188172
Okay. I configured with "--with-pydebug" all the time, and
then comes the segfault.
Without "--with-pydebug", everything seems fine.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2006-02-19 10:28
Message:
Logged In: YES
user_id=6380
Are you sure you did a "make clean"? Because the dictobject
struct lay-out is changed you may have to do that.
If the problem persists, try adding setting
ma_default_factory to NULL explicitly in dict_new (like it's
already done in PyDict_New) and see if that makes a difference.
BTW the change to PyDict_GetItem must be removed -- it's not
a good idea.
----------------------------------------------------------------------
Comment By: Georg Brandl (birkenfeld)
Date: 2006-02-18 05:39
Message:
Logged In: YES
user_id=1188172
Observations:
* Doesn't the PyMemberDef need a sentinel?
* Is PyObject_CallObject() faster than PyEval_CallFunction()
with no arguments present?
* The current patch gives me a segfault at interpreter exit
because there is a dict object whose ma_default_factory was
not initialized to NULL.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1433928&group_id=5470
More information about the Patches
mailing list