[Types-sig] Multiple dispatch [for interest only]

skaller skaller@maxtal.com.au
Fri, 24 Dec 1999 14:16:04 +1100


[This is mainly for interest, it is not a 'proposal' of any kind]

A day after implementing type checked function arguments in Viper,
using the syntax

	def f(a! t1, b!t2) ...

I am thinking of some interesting new possibilities this leads to
for polymorphism. This comes about
because the type information is part of the run-time function
object. 

Consider, just for the moment:

	def f(x!t1): ...
	def f(x!t2): ...

and suppose this was allowed, and matched the arguments of a function
call in reverse order of definition:

	f(a) 

is notionally expanded to

	if type(a) is t2: f_t2(a)
	else if type(a) is t1: f_t1(a)

where f_t1, and f_t2 are the two functions
named f, defined by the above definitions, respectively.

This could be useful, and the mechanism has a name:
it's called multiple dispatch.  In particular,
it is _message_ based multiple dispatch.
[The table of possible signatures is centralised,
and associated with the function name, rather than
distributed between 'objects' as methods]

Here's an example: suppose you have an algorithm:
	
	# module System
	def add(a,b): # add standard integer types
	def gcd(a,b): ... add(a,b) ... add(b,a)

It's well known that Euclids algorithm for calculating
the greatest common divisor is generic: it works when 
both numbers elements of the same integral domain such as
the integers. Python already has two integer types,
'int' and 'long': the algorithm will work for both.

What happens if a new type, MyType, is created?
You would write:

	_add = add
	def add(a,b): 
		if type(a) is MyNumber and type(b) is MyNumber:
			... # details for adding MyNumbers
		else: _add(a,b) # call old function
	System.add = add

I note this is somewhat insecure, since _add is likely to
get redefined in a subsequent attempt at the same thing,
Viper has a syntax to fix this, but that's another story :-]

I also note that this _requires_ invading the module System,
and breaking the 'freezing'.

Let me call this an incremental function defintion.
What we're doing, is overloading the add function.
This is part of the theoretical 'type' of MyNumber
(that you can add them). Adding a new meaning to
the System 'add' function is essential for making
the generic algorithm work with MyNumbers.


[There is another way to do this, using classes 
and __add__ and __radd__ methods,
but this does NOT generalise because of the usual
covariance problem with object orientation:
please consider a three argument function to see this,
I've used a two argument one for brevity]

So, I was thinking that it would be interesting to consider
what would happen if the run time system, instead of
just looking a function up by name, ALSO used the types
of the arguments to help choose which one to call.

[This could _also_ be applied to methods, except that
the method would not be applied to the 'object',
since the class of the object already acts as a lookup
namespace]

Just to summarise: I'm thinking about some way of
permitting 'incremental' function defintions,
and an associate 'multiple dispatch'.

Before considering syntax, there are architectural
issues. The invisaged implementation for _builtin_
functions, such as 'len', would be to keep a list
of the functions named 'len' associated with the
string 'len': instead of the lookup table
I'm using at present, which is like:

	{'len': len, 'abs': abs .. ]

the table would be changed to

	{'len', [len1, len2 ..] .. }

and instead of dispatch

	return table['len'](arg)

we'd have something like:

	for f in table['len']:
		if type(arg) = f.param_types[0]:
			return f(arg)


This is NOT particularly new for Python, it is similar
to the way exceptions are matched against handlers.

I note: the actual implementation is trivial.
The extra overhead calling functions is also minimal.
The issues here are semantics, and to some extent, syntax.
-------------------------------------------------------

FYI: what excites me about this is that it is well known
that object oriented polymorphism doesn't work in the sense that
it doesn't extend gracefully to multiple arguments.

Dynamic multiple dispatch is a fairly unprincipled way of
supporting the notion of genericity; however proper
genericity cannot so easily be implemented: no one knows
how to do it, it is a very active research area in which
there are currently a diverse collection of theories.

Many functional languages provide _correct_, well principled
support for genericity, using type variables, modules,
Haskell classes/monads .. etc, but all of these type systems
are severely limited in their expressivity. Generally,
less well principled systems like C++ templates, or dynamic
disptching, are less secure .. but more expressive.

For those interested in why someone like me, interested
in genericity, is VERY interested in Python: this is the reason:
it is possible to use the relatively unprincipled dynamism
to _implement_ generic things in python, even though
such implementations are not very secure. This is somewhat
like using casts in C.

As any Python enthusiast will tell you, Pythons dynamic
nature makes its object orientation more powerful
(expressive) than that provided by many statically
typed languages .. and also more dangerous. 

However, OO doesn't work well, and it is interesting
to experiment with a dynamic language which does not
have the constraints of _both_ strict object orientation
and also static typing: a suitable dynamic system
can suggest how to 'design' a static type system
that retains the expressivity enabled by dynamism,
but also provide the extra security static typing
traditionally provides.


-- 
John Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
voice: 61-2-9660-0850


Return-Path: <owner-types-sig@python.org>
Delivered-To: types-sig@dinsdale.python.org
Received: from python.org (parrot.python.org [132.151.1.90])
	by dinsdale.python.org (Postfix) with ESMTP id AFB6D1CDCE
	for <types-sig@dinsdale.python.org>; Wed, 22 Dec 1999 04:14:14 -0500 (EST)
Received: from nebula.lyra.org (IDENT:gstein@nebula.lyra.org [216.98.236.100])
	by python.org (8.9.1a/8.9.1) with ESMTP id EAA04412
	for <types-sig@python.org>; Wed, 22 Dec 1999 04:14:13 -0500 (EST)
Received: from localhost (gstein@localhost)
	by nebula.lyra.org (8.9.3/8.9.3) with ESMTP id BAA25356
	for <types-sig@python.org>; Wed, 22 Dec 1999 01:16:46 -0800
X-Received: from chronis.pobox.com (chronis.pobox.com [208.210.124.49])
	by nebula.lyra.org (8.9.3/8.9.3) with ESMTP id AAA25205
	for <gstein@lyra.org>; Wed, 22 Dec 1999 00:39:31 -0800
X-Received: by chronis.pobox.com (Postfix, from userid 1001)
	id CAD4D9B1B; Wed, 22 Dec 1999 03:36:36 -0500 (EST)
Date: Wed, 22 Dec 1999 03:36:36 -0500
From: scott <scott@chronis.pobox.com>
To: Greg Stein <gstein@lyra.org>
Subject: Re: [Types-sig] recursive types, type safety, and flow analysis
Message-ID: <19991222033636.A14007@chronis.pobox.com>
References: <19991221234415.A12628@chronis.pobox.com> <Pine.LNX.4.10.9912212146150.16305-100000@nebula.lyra.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.7i
In-Reply-To: <Pine.LNX.4.10.9912212146150.16305-100000@nebula.lyra.org>
ReSent-Date: Wed, 22 Dec 1999 01:16:39 -0800 (PST)
Resent-From: Greg Stein <gstein@lyra.org>
Resent-To: types-sig@python.org
ReSent-Subject: Re: [Types-sig] recursive types, type safety, and flow analysis
ReSent-Message-ID: <Pine.LNX.4.10.9912220116390.16305@nebula.lyra.org>
Sender: types-sig-admin@python.org
Errors-To: types-sig-admin@python.org
X-BeenThere: types-sig@python.org
X-Mailman-Version: 1.2 (experimental)
Precedence: bulk
List-Id: Special Interest Group on the Python type system <types-sig.python.org>

On Tue, Dec 21, 1999 at 10:02:08PM -0800, Greg Stein wrote:
> On Tue, 21 Dec 1999, scott wrote:
> > On Tue, Dec 21, 1999 at 08:06:28PM -0800, Greg Stein wrote:
> >...
> > > Basically, I think your request to find and report on
> > > use-before-definition is "intractable" *when* you're talking about
> > > multiple bodies of code (e.g. two functions, or the global space and a
> > > function).

[...]

> > I'd agree that this has been demonstrated, but only for examples of
> > code which seem like great candidates for compile time warnings.  Are
> > there examples which strike you otherwise?
> 
> One of my points was that I do not believe you can issue warnings because
> you can't know whether a problem might exist. Basically, it boils to not
> knowing whether a global used by a function exists at the time the
> function is called. So you either issues warnings for all global usage, or
> you issue none. You can make a few guesses based on what happens in the
> global code body, but I don't think the guesses will really improve the
> quality of warnings.

I personally can't imagine that it would be an issue to treat globals
in functions as anything other than a simple flat-rule: for type
checking purposes, globals must be defined at compile time in the
global namespace, that's just me, but I'd probably fire any of the
python programmers that work for me if they did what you describe
above with globals in a large project :)

> 
> Examples? No, I don't really have any handy. Any example would be a short
> code snippet and people would say, "yah. that's bad. it should fail." But
> the issue is with larger bodies of code... that's what we're trying to
> fix! So... No, I don't have a non-trivial example.

I can't even imagine one, so if there's any way to describe this
global issue a little further without putting too much effort into it,
I'd appreciate it.

[...]

> 
> The origination of this discussion was based on the recursive type issue.
> If we have runtime objects, then I doubt we could support the recursive
> type thing without some additional work. Or, as I'm suggesting, you do not
> allow an undefined name (as specified by runtime/execution order) to be
> used in a typedecl.

you could even allow typedecl to import modules for the sake of
gaining access to the names, where those imports would only occur when
the optional type checking is turned on.  I'd agree that the use of an
undefined name should be disallowed.  With the presence of
type-check-only import, following the same
no-mutually-recursive-imports rule of the regular import, but only
importing typedecl statements, you could achieve all this at compile
time. 

I've run into this issue on large projects, importing a classname,
just to run 
    assert isinstance(foo, thatclass), "complain meaningfully"

But it hasn't come up with recursive types in any code I've seen, just
deeply-complex types in terms of container and class hierarchy
relationships.

> 
> The design of how to handle recursive types depends on the decision to
> include/exclude runtime objects that define function, class, or module
> typedecl information. Even if we defer the runtime creation of those
> objects, it will affect the design today.
> 

indeed.

[...]
> 
> I do believe the information goes into the bytecode, but I don't think
> that is the basis for needing to plan now. Instead, we have to define the
> semantics of when/where those typedecl objects exist. Do we have them at
> runtime? 

in the above, no, though we do have the ability to find a name
anywhere at compile time.

>Does a name have to exist (in terms of runtime execution) for it
> to be used in a typedecl, or does it just have to exist *somewhere*? 

in the above, it has to exist in the typedecl 'execution' model, which
is during compile time.

>If
> names must exist before usage, then how is the recursive type thing
> handled? With unspecified typedecls? (like an unspecified struct)

How about an iterative model which continues until all typedecl names
are filled in?

I understand your concern about 2 distinct namespace models being
unsettling.  It raises issues of what exactly we want out of static
typing, and what sets of existing and future python code may benefit
from static typing, and these are indeed big issues.  

For me, it is sufficient to proceed from the premiss that you can't
have static typing work on code that redefines types at run time, and
to limit runtime checking (for the time being) to optionally have the
interpreter take some action (warn or abort) when that happens. That
requirement alone implies that typedecl'd names and their typedecl
bodies need to be available at run time, which is sufficient to
support just about any future developments in a static-typeing
interface in pure python.

As an aside, I'm glad to learn it wouldn't be difficult to have python
put static type information in it's byte code.  That seems like a good
place for it.

As weird as it is to have a separate type-decl name model, it seems
infintely  to depict dynamic typing in a static typing model.

scott