Python 3.0 - is this true?

Roy Smith roy at panix.com
Sun Nov 9 19:26:16 EST 2008


In article <491768e5$0$2554$9b622d9e at news.freenet.de>,
 "Martin v. Löwis" <martin at v.loewis.de> wrote:

> The "homogeneous vs. heterogeneous" discussion is not that much about
> type, but more about semantics of the individual values. If you
> represent (firstname, lastname, age), you use tuples - three different
> (and orthogonal) pieces of information. There is no canonical way of
> continuing that sequence (hence the fixed-size nature of the tuple is
> no concern). OTOH, the token list has the structure [token, token, ...].
> The third element is semantically not different from the first element,
> hence a list is appropriate.

That is a much better way to describe it than talking about homogeneity, 
except that it's not so much the semantics of the individual values, but 
the semantics of the container.  The unfortunate thing about tuples vs. 
lists is that there are two completely orthogonal concepts in play.

On the one hand, you have an inherently fixed-sized collection of objects, 
which represent specific attributes.  A good analogy would be a C/C++ 
struct.  On the other hand, you have an sequence.  Something which might 
logically have more or fewer elements at any given time.  Something where 
it makes sense to talk about "the next item", or "the previous item", or 
"the last item".

In the other dimension, you have mutable vs. immutable.  Or, more to the 
point, hashable vs. non-hashable.  Which, for most people means, "can be 
used as a dictionary key" vs. "cannot be used as a dictionary key"

The problem is, we only have two types of containers to cover the four 
possible combinations of these two orthogonal concepts.  If you want to 
make something a dictionary key, you have no choice; it's got to be a 
tuple.  In many cases, that trumps all other considerations.

Consider this problem:

At run time, you will construct an instance of class FrameCounter, fc.  The 
constructor takes a single integer, n.  You call fc.next_item(i), at least 
n times.  Then, calling, fc.get_frames() will return a dict whose keys are 
sub-sequences of length n, and whose values are a count of how many times 
that sub-sequence appeared in the input.

Exmaple:

fc = FrameCounter(3)
for i in [1, 2, 3, 1, 2, 3]:
   fc.next_item(i)
d = fc.get_frames()

will result in:

d[(1, 2, 3)] -> 2
d[(2, 3, 1)] -> 1
d[(3, 1, 2)] -> 1
d[(4, 5, 6)] -> raises KeyError

This is fairly simple thing to code.  You accumulate items in a list by 
appending each time next_item() is called, and either using pop(0) or 
slicing to keep the last n items.  Then, you construct a tuple from the 
list and use that as the dictionary key to count how many times you've seen 
that sub-sequence.

The point is that you're forced to use lists to compute the sub-sequences.  
This makes sense, because lists fit the "indefinite length sequence" idea.  
Then, you're forced to use tuples as the dictionary keys, because tuples 
are immutable/hashable, while lists are not.  This use of tuples doesn't 
fit the "inherently fixed-sized collection of heterogeneous objects" idea 
of a tuple.  In this case, a tuple really is just an immutable list.  Your 
choice of containers is not based on any theoretical arguments of what each 
type was intended to represent, but the cold hard reality of what 
operations they support.



More information about the Python-list mailing list