[Python-ideas] dict.setdefault_call(), or API variations thereupon

Fri Nov 2 12:52:24 EDT 2018

On Thu, Nov 1, 2018 at 8:34 PM, Steven D'Aprano <steve at pearwood.info> wrote:

> The bottom line is, if I understand your proposal, the functionality
> already exists. All you need do is subclass dict and give it a
> __missing__ method which does what you want.

or subclass dict and give it a "setdefault_call") method :-)

But as I think Guido wasa pointing out, the real difference here is that
DefaultDict, or any other subclass, is specifying what the default callable
is for the entire dict, rather than at time of use. Personally, I'm pretty
sure I"ve only used one default for any given dict, but I can imaige the
are use cases for having different defaults for the same dict depending on
context.

As for the OP's justification:

"""
If it's not clear, the purpose is to eliminate the overhead of creating an
empty list or similar in situations like this:

d = {}
for i in range(1000000):  # some large loop
     l = d.setdefault(somekey, [])
     l.append(somevalue)

# instead...

for i in range(1000000):
    l = d.setdefault_call(somekey, list)
    l.append(somevalue)

"""

I presume the point is that in the first case, somekey might be often the
same, and setdefault requires creating an actual empty list even if  the
key is alredy there. whereas case 2 will only create the empty list if the
key is not there. doing some timing with defaultdict:

In [19]: def setdefault():
    ...:     d = {}
    ...:     somekey = 5
    ...:     for i in range(1000000):  # some large loop
    ...:         l = d.setdefault(somekey, [])
    ...:         l.append(i)
    ...:     return d

In [20]: def default_dict():
    ...:     d = defaultdict(list)
    ...:     somekey = 5
    ...:     for i in range(1000000):  # some large loop
    ...:         l = d[somekey]
    ...:         l.append(i)
    ...:     return d

In [21]: % timeit setdefault()
185 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [22]: % timeit default_dict()
128 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

so yeah, it's a little more performant, and I suppose if you were using a
more expensive constructor, it would make a lot more difference. But then,
how much is it likely to matter in a real use cases -- this was 1 million
calls for one key and you got a 50% speed up -- is that common?

So it seems this would give us slightly better performance than
.setdefault() for the use cases where you are using more than one default
for a given dict.

BTW:

+1 for a mention of defaultdict in the dict.setdefault docs -- you can't do
everything with defaultdict that you can with setdefault, but it is a very
common use case.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20181102/fdc6f772/attachment-0001.html>