[Python-Dev] re: Using lists as sets

Jeremy Hylton jeremy@cnri.reston.va.us
Mon, 20 Mar 2000 14:51:28 -0500 (EST)


>>>>> "MZ" == Moshe Zadka <moshez@math.huji.ac.il> writes:

  MZ> On Mon, 20 Mar 2000, Jeremy Hylton wrote:
  >> Yet another possibility, implemented in early versions of JPython
  >> and later removed, was to treat a dictionary exactly like a list:
  >> Call __getitem__(0), then 1, ..., until a KeyError was raised.
  >> In other words, a dictionary could behave like a list provided
  >> that it had integer keys.

  MZ> Two remarks: Jeremy meant "consecutive natural keys starting
  MZ> with 0", (yes, I've managed to learn mind-reading from the
  MZ> timbot) 

I suppose I meant that (perhaps you can read my mind as well as I
can);  I also meant using values of Python's integer datatype :-).


and that (the following is considered a misfeature):

  MZ> import UserDict 
  MZ> a = UserDict.UserDict() 
  MZ> a[0]="hello"
  MZ> a[1]="world"

  MZ> for word in a: print word

  MZ> Will print "hello", "world", and then die with KeyError.  I
  MZ> realize why this is happening, and realize it could only be
  MZ> fixed in Py3K. However, a temporary (though not 100% backwards
  MZ> compatible) fix is that "for" will catch LookupError, rather
  MZ> then IndexError.

I'm not sure what you mean by "fix."  (Please read your mind for me
<wink>.)  I think by fix you mean, "allow the broken code above to
execute without raising an exception."  Yuck!

As far as I can tell, the problem is caused by the special
way that a for loop uses the __getitem__ protocol.  There are two
related issues that lead to confusion.

In cases other than for loops, __getitem__ is invoked when the
syntactic construct x[i] is used.  This means either lookup in a list
or in a dict depending on the type of x.  If it is a list, the index
must be an integer and IndexError can be raised.  If it is a dict, the
index can be anything (even an unhashable type; TypeError is only
raised by insertion for this case) and KeyError can be raised.

In a for loop, the same protocol (__getitem__) is used, but with the
special convention that the object should be a sequence.  Python will
detect when you try to use a builtin type that is not a sequence,
e.g. a dictionary.  If the for loop iterates over an instance type
rather than a builtin type, there is no way to check whether the
__getitem__ protocol is being implemented by a sequence or a mapping.

The right solution, I think, is to allow a means for stating
explicitly whether a class with an __getitem__ method is a sequence or
a mapping (or both?).  Then UserDict can declare itself to be a
mapping and using it in a for loop will raise the TypeError, "loop
over non-sequence" (which has a standard meaning defined in Skip's
catalog <0.8 wink>).

I believe this is where types-vs.-classes meets
subtyping-vs.-inheritance.  I suspect that the right solution, circa
Py3K, is that classes must explicitly state what types they are
subtypes of or what interfaces they implement.

Jeremy