[Python-Dev] re: Using lists as sets
Jeremy Hylton
jeremy@cnri.reston.va.us
Mon, 20 Mar 2000 14:51:28 -0500 (EST)
>>>>> "MZ" == Moshe Zadka <moshez@math.huji.ac.il> writes:
MZ> On Mon, 20 Mar 2000, Jeremy Hylton wrote:
>> Yet another possibility, implemented in early versions of JPython
>> and later removed, was to treat a dictionary exactly like a list:
>> Call __getitem__(0), then 1, ..., until a KeyError was raised.
>> In other words, a dictionary could behave like a list provided
>> that it had integer keys.
MZ> Two remarks: Jeremy meant "consecutive natural keys starting
MZ> with 0", (yes, I've managed to learn mind-reading from the
MZ> timbot)
I suppose I meant that (perhaps you can read my mind as well as I
can); I also meant using values of Python's integer datatype :-).
and that (the following is considered a misfeature):
MZ> import UserDict
MZ> a = UserDict.UserDict()
MZ> a[0]="hello"
MZ> a[1]="world"
MZ> for word in a: print word
MZ> Will print "hello", "world", and then die with KeyError. I
MZ> realize why this is happening, and realize it could only be
MZ> fixed in Py3K. However, a temporary (though not 100% backwards
MZ> compatible) fix is that "for" will catch LookupError, rather
MZ> then IndexError.
I'm not sure what you mean by "fix." (Please read your mind for me
<wink>.) I think by fix you mean, "allow the broken code above to
execute without raising an exception." Yuck!
As far as I can tell, the problem is caused by the special
way that a for loop uses the __getitem__ protocol. There are two
related issues that lead to confusion.
In cases other than for loops, __getitem__ is invoked when the
syntactic construct x[i] is used. This means either lookup in a list
or in a dict depending on the type of x. If it is a list, the index
must be an integer and IndexError can be raised. If it is a dict, the
index can be anything (even an unhashable type; TypeError is only
raised by insertion for this case) and KeyError can be raised.
In a for loop, the same protocol (__getitem__) is used, but with the
special convention that the object should be a sequence. Python will
detect when you try to use a builtin type that is not a sequence,
e.g. a dictionary. If the for loop iterates over an instance type
rather than a builtin type, there is no way to check whether the
__getitem__ protocol is being implemented by a sequence or a mapping.
The right solution, I think, is to allow a means for stating
explicitly whether a class with an __getitem__ method is a sequence or
a mapping (or both?). Then UserDict can declare itself to be a
mapping and using it in a for loop will raise the TypeError, "loop
over non-sequence" (which has a standard meaning defined in Skip's
catalog <0.8 wink>).
I believe this is where types-vs.-classes meets
subtyping-vs.-inheritance. I suspect that the right solution, circa
Py3K, is that classes must explicitly state what types they are
subtypes of or what interfaces they implement.
Jeremy