[Python-Dev] type categories

Andrew Koenig ark@research.att.com
14 Aug 2002 17:22:35 -0400


>> Perhaps the reason it's rare is that it's difficult to do.

Guido> Perhaps...  Is it the chicken or the egg?

Did you hear about the two philosophers in the diner?  One ordered a
chicken-salad sandwich and the other ordered an egg-salad sandwich,
because they wanted to see which would come first.

>> One of the cases I was thinking of was the built-in * operator,
>> which does something completely diferent if one of its operands
>> is an integer.

Guido> Really?  I suppose you're thinking of sequence repetition.

Right.

Guido> I consider that one of my early mistakes (it didn't make it to
Guido> my "regrets" list but probably should have).  It would have
Guido> been much simpler if sequences simply supported multiplcation,
Guido> and in fact repeated changes to the implementation (and subtle
Guido> edge cases of the semantics) are slowly nudging into this
Guido> direction.

It's still a plausible example, I think.

>> Another one was the buffering iterator we were discussing earlier,
>> which ideally would omit buffering entirely if asked to buffer a
>> type that already supports multiple iteration.

Guido> How do you do that in C++?  I guess you overload the function
Guido> that asks for the iterator, and call that function in a
Guido> template.  I think in Python we can ask the caller to provide a
Guido> buffering iterator when a function needs one.  Since we really
Guido> have very little power at compile time, we sometimes need to do
Guido> a little more work at run time.  But the resulting language
Guido> appears to be easier to understand (for most people anyway)
Guido> despite the theoretical deficiency.

I understand that, I think.

The C++ library has a notion of ``iterator traits'' that is implemented
by a template class named, of all thing, iterator_traits.  So, for example,
if T is an iterator type, then iterator_traits<T>::value_type is the
type that dereferencing an object of type T will yield.  To reveal what
operations an iterator supports, iterator_traits<T>::iterator_category
is one of the following five types, depending on the iterator:

        input_iterator_tag
        output_iterator_tag
        forward_iterator_tag
        bidirectional_iterator_tag
        random_access_iterator_tag

Each of the last three of these types is derived from the one before it.
It is possible to instantiate objects of any of these types, but the
objects carry no information beyond their type and identity.

Now, suppose you want to implement an algorithm that requires a
bidirectional iterator, but can be done more efficiently with a
random-access iterator.  Then you might write something like this:

        // The bidirectional-iterator case
        template<class It>
        void foo_aux(It begin, It end, bidirectional_iterator_tag) {
                // ...
        }

        // The random-access-iterator case
        template<class It>
        void foo_aux(It begin, It end, random_access_iterator_tag) {
                // ...
        }

and then you can select the appropriate algorithm at compile time this way:

        template<class It>
        void foo(It begin, It end) {
                foo_aux(begin, end,
                   typename iterator_traits<It>::iterator_category());
        }

This code creates an extra object (the anonymous object created by the
expression ``typename iterator_traits<It>::iterator_category()'' for the
sole purpose of using its type to distinguish between the two overloaded
versions of foo_aux.  This distinction is made at compile time, and if
the compiler is smart enough, it will also optimize away the empty,
anonymous object.

So this is an example of what I mean by ``dispatching based on a type
category.''  In C++ it's done at compile time, but what I care about
in the context of Python is not when it is done, but rather how
convenient it is to express.  (I don't think the C++ mode of
expression is particularly convenient, but at least it's possible)

Guido> I'm not quite sure why that is, but I am slowly developing a
Guido> theory, based on a remark by Samuele Pedroni; at least I
Guido> believe it was he who remarked at some point "Python has only
Guido> run time", ehich got me thinking.  My theory, partially
Guido> developed though it is, is that it is much harder (again, for
Guido> most people :-) to understand in your head what happens at
Guido> compile time than it is to understand what goes at run time.
Guido> Or perhaps that understanding *both* is harder than
Guido> understanding only one.

I have no problem believing that.

Guido> But I believe that for most people acquiring a sufficient
Guido> mental model for what goes on at run time is simpler than the
Guido> mental model for what goes on at compile time.  Possibly this
Guido> is because compilers really *do* rely on very sophisticated
Guido> algorithms (such as deciding which overloading function is
Guido> called based upon type information and available conversions).
Guido> Run time on the other hand is dead simple most of the time --
Guido> it has to be, since it has to be executed by a machine that has
Guido> a very limited time to make its decisions.

That's OK with me.  But I'd still like a less ad-hoc way of making
those run-time tests.

Guido> All this reminds me of a remark that I believe is due to John
Guido> Ousterhout at the VHLL conference in '94 in Santa Fe, where you & I
Guido> first met.  (Strangely it was Perl's Tom Christiansen who was in a
Guido> large part responsible for the eclectic program.)  You gave a talk
Guido> about ML, and I believe it was in response to your talk that John
Guido> remarked that ML was best suited for people with an IQ of over 150.

I'm still not convinced that's necessarily true -- I think it depends
a great deal on how ML is taught.  I do believe that most of what has
been written about ML is hard to follow for people who have grown up
in the imperative world, but I don't think it has to be that way.

Guido> That rang true to me, since the only other person besides you
Guido> that I know who is a serious ML user definitely falls into that
Guido> category.

Thanks for the compliment!

Guido> And ML is definitely a language that does more than the average
Guido> language at compile time.

Yes.  One of the reasons I find it interesting, incidentally, is that
it still manages to generate surprisingly efficient machine code.

>> Actually, I thought of them but omitted them to avoid confusion
>> between a type and a category with a single element.

Guido> Can you explain?  Neither string (which has Unicode and 8-bit,
Guido> plus a few other objects that are sufficiently string-like to
Guido> be regex-searchable, like arrays) nor file (at least in the
Guido> "lore protocol" interpretation of file-like object) are
Guido> categories with a single element.

Fair enough.  I just didn't have examples at my fingertips and thought
at first that using those exmaples would confuse matters.  I don't
mind asking them.

Guido> I believe that the notion of an informal or "lore" (as Jim
Guido> Fulton likes to call it) protocol first became apparent when we
Guido> started to use the idea of a "file-like object" as a valid
Guido> value for sys.stdout.

>> OK.  So what I'm asking about is a way of making notions such as
>> "file-like object" more formal and/or automatic.

Guido> Yeah, that's the holy Grail of interfaces in Python.

Cool!  (I care much less about type checking because, as I mentioned
in another message, there are uncheckable things such as being an
order relation that I would like to use for dispatching anyway).

>> Of course, one reason for my interest is my experience with a
>> language that supports compile-time overloading -- what I'm really
>> seeing on the horizon is some kind of notion of overloading in
>> Python, perhaps along the lines of ML's clausal function
>> definitions (which I think are truly elegant).

Guido> Honestly, I hadn't read this far ahead when I brought up ML
Guido> above. :-)

:-)

Guido> I really hope that the holy grail can be found at run time
Guido> rather than compile time.  Python's compile time doesn't have
Guido> enough information easily available, and to gather the
Guido> necessary information is very expensive (requiring
Guido> whole-program analysis) and not 100% reliable (due to Python's
Guido> extreme dynamic side).

I have no problem with that.  So here's a simple example of ML's clausal
functions:

        fun len([]) = 0
          | len(h::t) = len(t) + 1

Here, [] is an empty list, and h::t is ML's way of spelling cons(h,t).
The two clauses (one per line) are checked in order *at run time* --
we're dispatching on the *value*, not the type of the argument.
if you like, this example is equivalent to the following:

        fun len(x) = if x = [] then 0 else len(tl(x))+1

(well, not really, but only an ML expert will see why, and it's not germane)

In the Python domain, I imagine something like this:

        def f(arg: Category1):
                ....
        or  f(arg: Category2):
                ....
        or  f(arg: Category3):

I would like the implementation to try each version of f until it
finds one that passes the constraints, and then executes that one.  If
none of them fits the bill, then it should throw an exception.

-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark


PS: Please forgive the erratic replies -- apparently our mail gateway
decided to hang onto a bunch of messages for a day or so...