[Python-Dev] More compact dictionaries with faster iteration

PJ Eby pje at telecommunity.com
Tue Dec 11 00:05:04 CET 2012


On Mon, Dec 10, 2012 at 4:29 PM, Tim Delaney <tim.delaney at aptare.com> wrote:
> Whilst I think Python should not move to ordered dictionaries everywhere, I
> would say there is an argument (no pun intended) for making **kwargs a
> dictionary that maintains insertion order *if there are no deletions*.

Oooh.  Me likey.  There have been many times where I've wished kwargs
were ordered when designing an API.

(Oddly, I don't remember any one of the APIs specifically, so don't
ask me for a good example.  I just remember a bunch of different
physical locations where I was when I thought, "Ooh, what if I
could...  no, that's not going to work.")

One other useful place for ordered dictionaries is class definitions
processed by class decorators: no need to write a metaclass just to
know what order stuff was defined in.

>  It sounds like we could get that for free with this implementation, although
> from another post IronPython might not have something suitable.

Actually, IronPython may already have ordered dictionaries by default; see:

  http://mail.python.org/pipermail/ironpython-users/2006-May/002319.html

It's described as an implementation detail that may change, perhaps
that could be changed to being unchanging.  ;-)


> I think there are real advantages to doing so - a trivial one being the ability
> to easily initialise an ordered dictionary from another ordered dictionary.

Or to merge two of them together, either at creation or .update().

I'm really starting to wonder if it might not be worth paying the
compaction overhead to just make all dictionaries ordered, all the
time.  The problem is that if you are always adding new keys and
deleting old ones (as might be done in a LRU cache, a queue, or other
things like that) you'll probably see a lot of compaction overhead
compared to today's dicts.

OTOH...  with a good algorithm for deciding *when* to compact, we can
actually make the amortized cost O(1) or so, so maybe that's not a big
deal.  The cost to do a compaction is at worst, the current size of
the table.  So if you wait until a table has twice as many entries
(more precisely, until the index of the last entry is twice what it
was at last compaction), you will amortize the compaction cost down to
one entry move per add, or O(1).

That would handle the case of a cache or queue, but I'm not sure how
it would work with supersized dictionaries that are then deleted down
to a fraction of their original size.  I suppose if you delete your
way down to half the entries being populated, then you end up with two
moves per delete, or O(2).  (Yes, I know that's not a valid O number.)

So, offhand, it seems pretty doable, and unlikely to significantly
change worst-case performance even for pathological use cases.  (Like
using a dict when you should be using a deque.)


More information about the Python-Dev mailing list