[Python-ideas] dict.fromkeys() better as dict().setkeys() ? (and other suggestions)

Ron Adam rrr at ronadam.com
Tue May 29 06:04:06 CEST 2007


Josiah Carlson wrote:
> Ron Adam <rrr at ronadam.com> wrote:
>> The dictionary fromkeys method seems out of place as well as miss-named.  IMHO
> 
> It is perfectly named (IMNSHO ;), create a dictionary from the keys
> provided; dict.fromkeys() .

That's ok, I won't hold it against you.  ;-)

What about it's being out of place?  Is this case like 'sorted' vs 'sort' 
for lists?

I'm ok with leaving it names as is if that's a real problem.  Another name 
for the mutate with keys method can be found.  That may reduce possible 
confusion as well.


>> There are enough correct uses of it in the wild to keep the behavior, but 
>> it can be done in a better way.
> 
> I wasn't terribly convinced by your later arguments, so I'm -1.

Yes, I'm not the most influential writer.

I'm not sure I can convince you it's better if you already think it's not. 
  That has more to do with your personal preference.  So lets look at how 
much it's actually needed in the current (and correct) form.


(These are rough estimates, I can try to come up with more accurate 
statistics if that is desired.)

Doing a search on google code turns up 300 hits for "lang:python \.fromkeys\(".

Looking at a sample of those, it looks like about 80% use it as a set() 
constructor to remove duplicates.  (For compatibility reason with python 
2.3 code, or for pytohn 2.3 and earlier code.)

Is there a way to narrow this down to python 2.4 and later? (anyone?)

A bit more sampling, it looks like about 8 of 10 of those remaining 20% can 
be easily converted to the following form without any trouble.

     d = dict()
     d.set_keys(keys, v=value)

That would leave about 12 cases (YMV) that need the inline functionality. 
For those a simple function can do it.

     def dict_from_keys(keys, v=value):
         d = dict()
         d.set_keys(keys, v)
         return d

Is 12 cases out of about 315,000 python files a big enough need to keep the 
current behavior?   315,000 is the number returned from google code for all 
python files, 'lang:python'. (I'm sure there are some duplicates)


Is this more convincing.   ;-)

(If anyone can come up with better numbers, that would be cool.)

>> I think this reads better and can be used in a wider variety of situations.
>>
>> It could be useful for setting an existing dictionary to a default state.
>>
>>      # reset status of items.
>>      status.set_keys(status.keys(), v=0)
> 
> This can be done today:

Of course all of the examples I gave can be done today.  But they nearly 
all require iterating in python in some form.


>     status.update((i, 0) for i in status.keys())
>     #or
>     status.update(dict.fromkeys(status, 0))

The first example requires iterating over the keys.  The second example 
works if you want to initialize all the keys.  In which case, there is no 
reason to use the update method.  dict.fromkeys(status, 0) is enough.


>> Or more likely, resetting a partial sub set of the keys to some initial state.
>>
>>
>> The reason I started looking at this is I wanted to split a dictionary into 
>> smaller dictionaries and my first thought was that fromkeys would do that. 
>>     But of course it doesn't.
> 
> Changing the bahvior of dict.fromkeys() is not going to happen. We can
> remove it, we can add a new method, but changing will lead to not so
> subtle breakage as people who were used to the old behavior try to use
> the updated method.
> 
> Note that this isn't a matter of "it's ok to break in 3.0", because
> dict.fromkeys() is not seen as being a design mistake by any of the
> 'heavy hitters' in python-dev or python-3000 that I have heard (note
> that I am certainly not a 'heavy hitter').

Then lets find a different name.


>> What I wanted was to be able to specify the keys and get the values from 
>> the existing dictionary into the new dictionary without using a for loop to 
>> iterate over the keys.
>>
>>     d = dict(1='a', 2='b', 3='c', 4='d', 5='e')
>>
>>     d_odds = d.from_keys([1, 3, 5])      # new dict of items 1, 3, 5
>>     d_evens = d.from_keys([2, 4])        # new dict of items 2, 4
>>
>> There currently isn't a way to split a dictionary without iterating it's 
>> contents even if you know the keys you need before hand.
> 
> Um...
> 
>     def from_keys(d, iterator):
>         return dict((i, d[i]) for i in iterator)

(iterating)

Yep as I said just above this.

  """There currently isn't a way to split a dictionary without iterating 
it's contents ..."""

Lists have __getslice__, __setslice__, and __delslice__.  It could be 
argued that those can be handled just as well with iterators and loops as 
well.  Of course we see them as seq[s:s+x], on both lists and strings.  So 
why not have an equivalent for dictionaries.  We can't slice them, but we 
do have key lists to use in the same way.


>> A del_keys method could replace the clear method.  del_keys would be more 
>> useful as it could operate on a partial set of keys.
>>
>>     d.delkeys(d.keys())    # The current clear method behavior.
> 
> I can't remember ever needing something like this that wasn't handled by
> d.clear() .

All or nothing.  d = dict() works just as well.

BTW, google code give 500 hits for "\.clear\(".  But it very un-clear how 
many of those are false positives due to other objects having a clear 
method.  It's probably a significant percentage in this case.


>> Some potentially *very common* uses:
>>
>>       # This first one works now, but I included it for completeness.  ;-)
>>
>>       mergedicts(d1, d2):
>>           """ Combine two dictionaries. """
>>           dd = dict(d1)
>>           return dd.update(d2)
> 
>     dict((i, d2.get(i, d1.get(i))) for i in itertools.chain(d1,d2))

(iterating)

And I'd prefer to define the function in this case for readability reasons.


>>       splitdict(d, keys):
>>           """ Split dictionary d using keys. """
>>           keys_rest = set(d.keys()) - set(keys)
>>           return d.from_keys(keys), d.from_keys(keys_rest)
> 
> I can't think of a simple one-liner for this one that wouldn't duplicate
> work.

:-)

This is one of the main motivators.


>>       split_from_dict(d, keys):
>>           """ Removes and returns a subdict of d with keys. """
>>           dd = d.from_keys(keys)
>>           d.del_keys(keys)
>>           return dd
> 
>     dict((i, d.pop(i, None)) for i in keys)

(iterating)


>>       copy_items(d1, d2, keys):
>>           """ Copy items from dictionary d1 to d2. """
>>           d2.update(d1.from_keys(keys))      # I really like this!
> 
>     d2.update((i, d1[i]) for i in keys)

(iterating)


>>       move_items(d1, d2, keys):
>>           """ Move items from dictionary d1 to d2. """
>>           d2.update(d1.from_keys(keys))
>>           d1.del_keys(keys)
> 
>     d2.update((i, d1.pop(i, None)) for i in keys)

(iterating)


>> I think the set_keys, from_keys, and del_keys methods could add both 
>> performance and clarity benefits to python.
> 
> Performance, sometimes, for some use-cases.  Clarity?  Maybe.  Your
> split* functions are a bit confusing to me, and I've never really needed
> any of the functions that you list.

I think sometime our need is determined by what is available for use.  So 
if it's not available, our minds filter it out from the solutions we 
consider.  That way, we don't need the things we don't have or can't get.

My minds "need filter" seems to be broken in that respect. I often need 
things I don't have.  But sometimes that works out to be good.  ;-)


>> So to summarize...
>>
>>      1.  Replace existing fromkeys method with a set_keys method.
>>      2.  Add a partial copy items from_keys method.
>>      3.  Replace the clear method with a del_keys method.
> 
> Not all X line functions should be builtins.

Of course I knew someone would point this out.  I'm not requesting the 
above example functions be builtins.  Only the changes to the dict methods 
be considered.    They would allow those above functions to work in a more 
efficient way and I'd be happy to add those functions to my own library.

With these methods in most cases the functions wouldn't even be needed. 
You would just use the methods in combinations with each other directly and 
the result would still be readable without a lot of 'code' overhead.

Also consider this from a larger view.  List has __getslice__, 
__setslice__, and __delslice__.  Set has numerous methods that operate on 
more than one element.

Dictionaries are suppose to be highly efficient, but they only have limited 
methods that can operate on more than one item at a time,  so you end up 
iterating over the keys to do nearly everything.

So as an alternative, leave fromkeys and clear alone and add...

     getkeys(keys)  ->  dict
     setkeys(keys, v=None)
     delkeys(keys)

Where these offer the equivalent of list slice functionality to dictionaries.


If you find that you are
> doing the above more often than you think you should, create a module
> with all of the related functionality that automatically patches the
> builtins on import and place it in the Python cheeseshop.  If people
> find that the functionality helps them, then we should consider it for
> inclusion.  As it stands, most of the methods you offer have a very
> simple one-line version that is already very efficient.

Iterators and for loops are fairly efficient for small dictionaries, but 
iterating can still be considerable slower than the equivalent C code if 
they are large dictionaries.


>> So this replaces two methods and adds one more.  Overall I think the 
>> usefulness of these would be very good.
> 
> I don't find the current dictionary API to be lacking in any way other
> than "what do I really need to override to get functionality X", but
> that is a documentation issue more than anything.

>> I also think it will work very well with the python 3000 keys method 
>> returning an iterator.  (And still be two fewer methods than we currently 
>> have.)
> 
> I'm sorry, but I can't really see how your changes would add to Python's
> flexibility without cluttering up interfaces and confusing current users.

I think it cleans up the API more than it clutters it up.  It coverts two 
limited use methods to be more general, and adds one more that works with 
the already existing update method nicely.

In both cases of the two existing methods, fromkeys and clear, your 
arguments, that there all ready exists easy one line functions to do this, 
would be enough of a reason to not have them in the first place.  So do you 
feel they should be removed?


I plan on doing a search of places where these things can make a difference 
in making the code more readable and/or faster.

Cheers,
    Ron



More information about the Python-ideas mailing list