[Python-3000] Interfaces for views and extended collections

Ian Bicking ianb at colorstudy.com
Tue Apr 11 21:40:01 CEST 2006


So Guido asked for more concrete discussion of things like views.  A 
richer set of collections also fits in here, as for instance dict.keys() 
would be a view with a collection interface not exactly like other 
collections.  I wrote up some notes, which I'll pass on.

So, I can imagine 3 attributes which seem orthogonal to me:

a) Is it ordered?
b) Can it contain multiple equal items?
c) Is it associative?

In addition there is mutability, though anything that can be mutable 
could also have an immutable form.  In general the only difference I see 
is the removal of methods, and the collection becomes hashable.

There's 8 combinations.

Associative collections:

   dict: no multiple items, unordered (missing an immutable form)
   multidict: multiple items, unordered
   ordered multidict: multiple items, ordered
   ordered dict: no multiple items, ordered (useful?)

Non-associative:

   Set/ImmutableSet: no multiple items, unordered
   bag: multiple items, unordered
   ordered set: no multiple items, ordered (useful? -- priority queue?)
   list/tuple: multiple items, ordered


So, ordered sets and ordered dicts seem questionable.  Also, it is 
unclear to me if there's a compelling reason for both ordered multidict 
and multidict, though the performance is likely to be different for the 
two (underneath multidict likely being a dictionary of sets, ordered 
multidict being a list of tuples).

So, at least, we are missing bags, (ordered) multidicts.  In terms of 
views, .keys() is a set, multidict.keys() is a bag, and 
orderedmultidict.keys() is a list (or tuple, or immutable list-like object).

There's some other attributes.  For instance, we have defaultdict. 
Another option that could be applied to almost any of these is a key 
function.  E.g., KeyedDict(key=lambda x: x.lower()) would be a 
case-insensitive dictionary.  Because views introduce an relationship 
between these collections, if we add a keyed dictionary we should also 
have a keyed set (returned by KeyedDict.keys()).  If we build this 
directly into the collection, that's easy, if not then we double the 
number of collections to be implemented; which is perhaps fine.

OK... so now we're this far; what will the interfaces be for these 
objects?  What follows is a long list of, I think, all the major methods 
and operators collections define.  This is mostly just source material, 
there's no conclusions or proposals.  I hope this will be useful to 
consider what the interfaces might look like, or to compare proposed 
interfaces against.



Methods that don't imply mutability
===================================

Combine two collections: +
Currently only applies to lists and tuples.  Lists cannot be added to 
tuples.  This only applies to objects with order, that can contain 
multiple items.

Combine two collections: |
For sets, produces the union.  Applies to objects without order and 
cannot contain multiple items.  Should dicts implement this?  Which side 
would take precedence?

Combine two collections: &
For sets, produces the intersection.  Should dicts implement this?

Combine two collections: ^
For sets, produces the symmetric difference.  Dicts could implement 
this; neither side would have to take precedence.

Difference: -
For sets, a-b produces a new set with items from b removed.  Could apply 
to bags.  Maybe also to dicts?

Repeat collection: *int
For lists and tuples, repeats the contents.  Could apply to ordered 
multidicts.  Maybe also to bags and multidicts?

Get item at index: [index]
Returns the item at that index.  Doesn't apply to dicts, even ordered 
dicts, because it conflicts with another meaning of __getitem__. 
Doesn't apply to unordered collections

Get item for association: [key]
Returns the value for that key.  Applies to all dictionaries.  Do 
multidicts return all values or the first value?  Or the last value? 
Dictionaries essentially return the *last* value (where all previous 
values were disposed of).  An additional method is needed for the other 
option.  If multiple values, in what kind of container are those values? 
  Mutable?  A view (yuck)?

Get an item, with not-found handling: .get()
Returns the value for that key.  For multidicts, does it return None or 
an empty collection if nothing is found?  For whatever method added in 
addition to __getitem__, is there an equivalent get*() form?

Count items: .count(item)
This works on lists.  For some reason not on tuples.  Should work on 
bags.  Would this apply to multidicts?  Is item a key or (key, value) in 
that case?  Would there be a way to count values?  Value counting can 
apply to all associations.

Get index: .index(item)
This works on lists.  Also for ordered dicts?  Would item be key, or 
(key, value)?

Count items: len(collection)
This works on everything.  For multidicts, does it count values or keys? 
  For bags does it count unique items or all items?

Test if an item is in collection: item in collection
Works for keys and items.  No way to tell if a value is in a dictionary, 
or a (key, value).

Test if a collection is a superset: .issuperset(collection)
For sets; could apply to bags.  Could be defined in terms of 
__contains__ and __iter__.  Could be a function.  .issubset(collection) 
is related, of course.

Get keys: .keys() (.iterkeys())
Returns all the keys in a dictionary.  In py3k as a view.  .iterkeys() 
goes away?  For multidicts, do the keys show up multiple times?  For 
ordered multidicts, this seems necessary.

Get values: .values() (.itervalues())
Returns all values.

Get items: .items() (.iteritems())
Returns all (key, value).  Or (key, values) for multidict?

Mutable methods
===============

Delete index: del col[index]
For lists only.

Delete key: del col[key]
For dicts.  For multidicts, does this delete the first value or all values?

Remove first matching: .remove(item)
For lists, sets, bags.  Could be used as a single-value-delete-only 
version of __delitem__ for multidicts.

Discard: .discard(item)
For sets, bags.  Removes, but doesn't signal error if missing.  Could be 
implement anywhere .remove() is implemented.

Pop index: .pop(index=-1)
Removes item at index, and returns that.

Pop key: .pop(key)
Removes an value at key, and returns.  For multidicts, removes just the 
first value?  No default key.

Pop key/value: .popitem()
Removes an arbitrary key/value and returns.  For multidicts, returns 
values one at a time?


Set item/index: col[index_or_key] = value
Sets the value, overwriting previous value (if any).  Cannot extend 
lists.  For multidicts, does this append a new value, or overwrite the 
value?

Append: .append(item)
For lists.  Would this apply to ordered dicts?  .append(key, value)?

Extend from sequence: .extend(seq)
Defined in terms of .append().

Extend without sequence: .update(mapping)
Defined for dict and set, should be for bag.  Defined in terms of 
__setitem__; has special behavior for sequences of tuples, vs. dict-like 
objects.

Add item: .add()
Applies to sets, bags.  Could apply to non-ordered multidicts; .add(key, 
value)?

Rotate: .rotate()
Currently implemented only in queues.  Could apply to any ordered 
collection.

Copy: .copy()
Applies to dicts, sets, bags.  Missing in lists.  Should all mutable 
collections have this?

Clear: .clear()
Removes all items.  Not implemented for lists for some reason.

In-place versions of &|^-: .union(other), x.intersection(other), 
x.difference(other), x.symmetric_difference(other)
Defined in terms of the operators (or vice versa).

In-place sort: .sort(cmp=None, key=None, reverse=None)
Does a sort.  Could apply to any ordered collection.  Sort on (key, 
value) or just key?

Reverse: .reverse()
Reverses; applies to all ordered collections.



Missing
=======

There's no method for getting the key or keys for a value (similar to 
list.index).

There's no way to invert a dictionary (change key->value to value->key). 
  With more types of dictionaries proper collections will exist to 
represent the inverse.  E.g., multidict is the inverse of dict.  This 
maybe is an argument for the full range of dictionaries.

The inverted form could also be a view.  somedict.invert()['foo'] would 
return all the keys which had the value of 'foo'.  This is a clever way 
of adding lots of new concise lookups by leveraging existing methods, 
but I suspect very hard to read in practice.



-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org


More information about the Python-3000 mailing list