[Python-Dev] PEP 3103: A Switch/Case Statement

Wed Jun 28 17:00:25 CEST 2006

On 6/28/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Guido van Rossum wrote:
> > I think we all agree
> > that side effects of case expressions is one way how we can deduce the
> > compiler's behind-the-scenes tricks (even School Ib is okay with
> > this). So I don't accept this as proof that Option 2 is better.
>
> OK, I worked out a side effect free example of why I don't like option 3:
>
>    def outer(cases=None):
>        def inner(option, force_default=False):
>            if cases is not None and not force_default:
>                switch option:
>                    case in cases[0]:
>                        # case 0 handling
>                    case in cases[1]:
>                        # case 1 handling
>                    case in cases[2]:
>                        # case 2 handling
>            # Default handling
>        return inner
>
> I believe it's reasonable to expect this to work fine - the case expressions
> don't refer to any local variables, and the subscript operations on the
> closure variable are protected by a sanity check to ensure that variable isn't
> None.

It's only reasonable if you're in school I.

As I have repeatedly said, the only use cases I care about are those
where the case expressions are constants for the lifetime of the
process. (The compiler doesn't need to know this but the programmer
does.)

> There certainly isn't anything in the code above to suggest to a reader that
> the condition attempting to guard evaluation of the switch statement might not
> do its job.
>
> With first-time-execution jump table evaluation, there's no problem - when the
> closure variable is None, there's no way to enter the body of the if
> statement, so the switch statement is never executed and the case expressions
> are never evaluated. Such functions will still be storing a cell object for
> the switch's jump table, but it will always be empty because the code to
> populate it never gets a chance to run.
>
> With the out of order execution involved in def-time evaluation, however, the
> case expressions would always be executed, even though the inner function is
> trying to protect them with a sanity check on the value of the closure variable.
>
> Using Option 3 semantics would mean that calling "outer()" given the above
> function definition will give you the rather surprising result "TypeError:
> 'NoneType' object is unsubscriptable", with a traceback pointing to the line
> "case cases[0]:" in the body of a function that hasn't been called, and that
> includes an if statement preventing that line from being reached when 'cases'
> is None.

That's a perfectly reasonable outcome to me.

> >> When it comes to the question of "where do we store the result?" for the
> >> first-execution calculation of the jump table, my proposal is "a
> >> hidden cell
> >> in the current namespace".
> >
> > Um, what do you mean by the current namespace? You can't mean the
> > locals of the function containing the switch. There aren't always
> > outer functions so I must conclude you mean the module globals. But
> > I've never seen those referred to as "the current namespace".
>
> By 'current namespace' I really do mean locals() - the cell objects themselves
> would be local variables from the point of view of the currently executing code.
>
> For functions, the cell objects would be created at function definition time,
> for code handled via exec-style execution, they'd be created just before
> execution of the first statement begins. In either case, the cell objects
> would already be in locals() before any bytecode gets executed.
>
> It's only the calculation of the cell *contents* that gets deferred until
> first execution of the switch statement.
>
> > So do I understand that the switch gets re-initialized whenever a new
> > function object is created? That seems a violation of the "first time
> > executed" rule, or at least a modification ("first time executed per
> > defined function"). Or am I misunderstanding?
>
> I took it as a given that 'first time execution' had to be per function
> and/or invocation of exec - tying caching of expressions that rely on module
> globals or closure variables to code objects doesn't make any sense, because
> the code object may have different globals and/or closure variables next time
> it gets executed.
>
> I may not have explained my opinion about that very well though, because the
> alternative didn't even seem to be an option.

PEP 3103 discusses several ways to implement first-time-really.

I suggest that you edit the PEP to add option 2a which is
first-time-per-function-definition.

> > But if I have a code object c containing a switch statement (not
> > inside a def) with a side effect in one of its cases, the side effect
> > is activated each time through the following loop, IIUC:
> >
> >  d = {}
> >  for i in range(10):
> >    exec c in d
>
> Yep. For module and class level code, the caching really only has any
> speed benefit if the switch statement is inside a loop.
>
> The rationale for doing it that way becomes clearer if you consider what would
> happen if you created a new dictionary each time through the loop:
>
>    for i in range(10):
>        d = {}
>        exec c in d
>        print d["result"]
>
> > I'm confused how you can first argue that tying things to the function
> > definition is one of the main drawbacks of Option 3, and then proceed
> > to tie Option 2 to the function definition as well. This sounds like
> > by far the most convoluted specification I have seen so far. I hope
> > I'm misunderstanding what you mean by namespace.
>
> It's not the link to function definitions that I object to in Option 3, it's
> the idea of evaluating the cases at function definition *time*. I believe the
> out-of-order execution involved will result in too many surprises when you
> start considering surrounding control flow statements that lead to the switch
> statement not being executed at all.
>
> If a switch statement is inside a class statement, a function definition
> statement, or an exec statement then I still expect the jump table to be
> recalculated every time the containing statement is executed, regardless of
> whether Option 2 or Option 3 is used for when the cases expressions get
> evaluated (similarly, reloading a module would recalculate any module level
> jump tables)
>
> And I agree my suggestions are the most involved so far, but I think that's
> because the current description of option 3 is hand-waving away a couple of
> important issues:
>    - how does it deal with module and class level code?

Not so much hand-waving as several possibilities, each of which is
clearly defined and has some (dis)advandages.

>    - how does it deal with switch statements that are inside conditional logic

No handwaving here -- these are still frozen.

> where that conditional logic determines whether or not the case
> expressions can be safely evaluated?

That would only matter for non-constant cases, a use case that I reject.

> (I guess the fact that I'm refining the idea while writing about it doesn't
> really help, either. . .)

We're all doing that, so no problem.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)