[Python-Dev] Alternatives to switch?

Sun Jun 25 06:31:57 CEST 2006

At first I was pretty excited about the switch proposals, but having 
read the various discussions I have to say that my enthusiasm has cooled 
quite a bit. The set of proposals that are currently being put forward 
have enough caveats and restrictions as to make the statement far less 
useful than I had originally hoped.

In fact, I'd like to point out something that hasn't been brought up, 
which is that in many cases having a closure rebind the switch cases 
defeats the purpose of the thing. For example:

    def outer():
       def inner(x):
          switch(x):
          case 1: ...
          case 2: ...
          case 3: ...

       return inner

If the switch cases are bound at the time that 'inner' is defined, it 
means that the hash table will be rebuilt each time 'outer' is called. 
But what if 'inner' is only intended to be used once? It means that the 
performance advantage of switch is completely negated. On the other 
hand, if 'inner' is intended to be used many times, then 'switch' is a 
win. But the compiler has no idea which of these two cases is true.

I want to try and think out of the box here, and ask the question what 
exactly we wanted a switch statement for, and if there are any other 
ways to approach the problem.

Switch statements are used to efficiently dispatch based on a single 
value to a large number of alternative code paths. The primary uses that 
have been put forward are in interpreting parse trees (the 'sre' example 
is a special case of this) and pickling. In fact, I would say that these 
two use cases alone justify the need for some improvement to the 
language, since both are popular design patterns, and both are somewhat 
ill-served by the current limits of the language.

Parse trees and pickling can, in fact, be considered as two examples of 
a broader category of "external interpretation of an object graph". By 
"external interpretation" I mean that the inspection and transformation 
of the data is not done via method calls on the individual objects, but 
rather by code outside of the object graph that recognizes individual 
object types or values and takes action based on that.

This type of architectural pattern manifests in nearly every sub-branch 
of software engineering, and usually appears when you have a complex 
graph of objects where it is inadvisable, for one reason or another, to 
put the behavior in the objects themselves. There are several possible 
reasons why this might be so. Perhaps the operations involve the 
relationships between the objects rather than the objects themselves 
(this is the parse tree case.) Or perhaps for reasons of modularity, it 
is desired that the objects not have built-in knowledge of the type of 
operation being performed - so that, for example, you can write several 
different kinds of serializers for an object graph without having to 
have the individual objects have special understanding of each different 
serializer type. This is the pickle case.

I'd like to propose that we consider this class of problems (external 
interpretation of an object graph) as the 'reference' use case for 
dicussions of the merits of the switch statement, and that evaluation of 
the merits of language changes be compared against this reference. Here 
are my reasons for suggesting this:

   -- The class is broad and encompasses a large set of practical, 
real-world applications.
   -- The class is not well-served by 'if-elif-else' dispatching styles.
   -- There have been few, if any, use cases in support of a switch 
statement that don't fall within this class.

So how does a switch statement help with this problem? Currently in 
Python, there are a limited number of ways to do N-way dispatching:

   -- if-elif-else chains
   -- method overloading
   -- dicts/arrays of function or method pointers
   -- exotic and weird solutions such as using try/except as a 
dispatching mechanism.

(I can't think of any others, but I am sure I missed something.)

We've already discussed the issues with if-elif-else chains, in 
particular the fact that they have O(N) performance instead of O(1).

The next two options both have in common the fact that they require the 
dispatch to go through a function call. This means that you are paying 
for the (allegedly expensive) Python function dispatch overhead, plus 
you no longer have access to any local variables which happened to be in 
scope when the dispatch occured.

It seems to me that the desire for a switch statement is a desire to get 
around those two limitations - in other words, if function calls were 
cheap, and there was an easy way to do dynamic scoping so that called 
functions could access their caller's variables, then there wouldn't be 
nearly as much of a desire for a switch statement.

For example, one might do a pickling function along these lines:

    dispatch_table = None
    def marshall( data ):
       type_code = None
       object_data = None

       def int_case():
          type_code = TYPE_INT
          object_data = str(data)

       def str_case():
          type_code = TYPE_STR
          object_data = str(data)

       # (and so on)

       # Build the dispatch table once only
       if dispatch_table is None:
          dispatch_table = dict(
             int, int_case,
             str, str_case,
             ...
          )

       dispatch_table[ type( data ) ]()

However, you probably wouldn't want to write the code like this in 
current-day Python -- even a fairly long if-elif-else chain would be 
more efficient, and the code isn't as neatly expressive of what you are 
trying to do. But the advantage is that the construction of the dispatch 
table is explicit rather than implicit, which avoids all of the 
arguments about when the dispatch should occur.

Another way to deal with the explicit construction of the switch table 
is to contsruct it outside of the function body. So for example, if the 
values to be switched on are meant to be evaluated at module load time, 
then the user can define the dispatch table outside of any function. The 
problem is, however, that the language requires any code that accesses 
the local variables of a function to be textually embedded within that 
function, and you can't build a dispatch table outside of a function 
that refers to code sections within a function.

In the interest of brevity, I'm going to cut it off here before I ramble 
on too much longer. I don't have an "answer", so much as I am trying to 
raise the right questions.

-- Talin