Pre-PEP: Dictionary accumulator methods

Alexander Schmolck a.schmolck at gmx.net
Sun Mar 20 14:25:11 EST 2005


Beni Cherniavsky <cben at users.sf.net> writes:

>> The relatively recent "improvement" of the dict constructor signature
>> (``dict(foo=bar,...)``) obviously makes it impossible to just extend the
>> constructor to ``dict(default=...)`` (or anything else for that matter) which
>> would seem much less ad hoc. But why not use a classmethod (e.g.
>> ``d=dict.withdefault(0)``) then?
>>
> You mean giving a dictionary a default value at creation time, right?

Yes. But creating a defaultdict type with aliased content to the original
dict would also be fine by me.

> Such a dictionary could be used very easily, as in <gasp>Perl::
>
>      foreach $word ( @words ) {
>          $d{$word}++;         # default of 0 assumed, simple code!
>      }
>
> </gasp>.  You would like to write::
>
>      d = dict.withdefault(0)  # or something
>      for word in words:
>          d[word] += 1         # again, simple code!
>
> I agree that it's a good idea but I'm not sure the default should be specified
> at creation time.  The problem with that is that if you pass such a dictionary
> into an unsuspecting function, it will not behave like a normal dictionary.

Have you got a specific example in mind? 

Code that needlessly relies on exceptions for "normal operation" is rather
perverse IMO and I find it hard to think of other examples.

> Also, this will go awry if the default is a mutable object, like ``[]`` - you
> must create a new one at every access (or introduce a rule that the object is
> copied every time, which I dislike).

I think copying should on by default for objects that are mutable (and
explicitly selectable via ``.withdefault(bar,copy=False)``).

Python of course doesn't have an interface to query whether something is
mutable or not (maybe something that'll eventually be fixed), but hashable
might be a first approximation. If that's too dodgy, most commonly the value
will be a builtin type anyway, so copy by default with "efficient
implementation" (i.e. doing nothing) for ints, tuples etc. ought to work fine
in practice.

> And there are cases where in different points in the code operating on the
> same dictionary you need different default values.

The main problem here is that the obvious .setdefault is already taken to
misnome something else. Which I guess strengthens the point for some kind of
proxy object.

> So perhaps specifying the default at every point of use by creating a proxy is
> cleaner::
>
>      d = {}
>      for word in words:
>          d.withdefault(0)[word] += 1
> Of course, you can always create the proxy once and still pass it into an
> unsuspecting function when that is actually what you mean.

Yup (I'd presumably prefer that second option for the above code).

>
> How should a dictionary with a default value behave (wheter inherently or a
> proxy)?
>
> - ``d.__getattr__(key)`` never raises KeyError for missing keys - instead it
>    returns the default value and stores the value as `d.setdefult()` does.
>    This is needed for make code like::
>
>        d.withdefault([])[key].append(foo)
>
>    to work - there is no call of `d.__setattr__()`, so `d.__getattr__()` must
>    have stored it.

I'm confused -- are you refering to __getitem__/__setitem__? Even then I don't
get what you mean: __getitem__ obviously works differently, but that would be
the whole point.

>
>    - `d.__setattr__()` and `d.__delattr__()` behave normally.

s/attr/item/ and agreed.

>
> - Should ``key in d`` return True for all keys?  

No. See below.

>    It is desiarable to have *some* way to know whether a key is really
>    present. But if it returns False for missing keys, code that checks ``key
>    in d`` will behave differently from normally equivallent code that uses
>    try..except. If we use the proxy interface, we can always check on the
>    original dictionary object, which removes the problem.
>
>    - ``d.has_key(key)`` must do whatever we decide ``key in d`` does.
>
>  - What should ``d.get(key, [default])`` and ``d.setdefault(key, default)``
>    do?  There is a conflict between the default of `d` and the explicitly given
>    default.  I think consistency is better and these should pretend that `key`
>    is always present.  But either way, there is a subtle problem here.

.setdefault ought to trump defaultdict's default. I feel that code that
operated without raising an KeyError on normal dicts should also operate the
same way on defaultdicts were possible. I'd also suspect that if you're
effectively desiring to override .setdefault's default you're up to something
dodgy.

> - Of course `iter(d)`, `d.items()` and the like should only see the keys
>    that are really present (the alternative inventing an infinite amount of
>    items out of the blue is clearly bogus).
>
> If the idea that the default should be specified in every operation (creating
> a proxy) is accepted, there is a simpler and more fool-proof solution: the
> ptoxy will not support anything except `__getitem__()` and `__setitem__()` at
> all.  Use the original dictionary for everything else.  This prevents subtle
> ambiguities.

Yes, that sounds like a fine solution to me -- if something goes wrong one is
at least likely to get an error immediately.

However the name .withdefault is then possibly a bit misleading -- but
.proxywithdefault is maybe a bit too long...

BTW, this scheme could also be extended to other collection types (lists and
sets, e.g.). e.g. ``l = []; l.proxywithdefault(0)[2] = 1;l `` => ``[0,0,1]``.

Whilst I think such behavior is asking for trouble if it's enabled by default
(as in Perl and Ruby, IIRC) and also lacks flexibility (as you can't specify
the fill value), occasionally it would be quite handy and I see little harm in
providing it when it's explicitly asked for.

>
>> Or, for the first and most common case, just a bag type?
>>
> Too specialized IMHO.  You want a dictionary with any default anyway.  If you
> have that, what will be the benefit of a bag type?

I more thought of the bag type as an alternative to having a dictionary with
default value (the counting case occurs most frequently and conceptually it is
well modelled by a bag).

And I don't feelt that a bag type is too specialized (plausibly too
specialized for a builtin -- but not for something provided by a module). Just
because there is natural tendency to shoehorn everything into the
bread-and-butter types of some language (dicts and lists in python's case),
doesn't mean one can't overdo it, because eventually one will end up with a
semantic mess.

Anyway my current preferences would be a proxy with default value and only
__getitem__ and __setitem__ methods -- as you suggested above, but possibly
also for other collection types than just dict.

'as





More information about the Python-list mailing list