[Python-checkins] r47104 - peps/trunk/pep-3103.txt

guido.van.rossum python-checkins at python.org
Mon Jun 26 18:41:14 CEST 2006


Author: guido.van.rossum
Date: Mon Jun 26 18:41:13 2006
New Revision: 47104

Added:
   peps/trunk/pep-3103.txt   (contents, props changed)
Log:
Checkpoint submit so I can edit this elsewhere.


Added: peps/trunk/pep-3103.txt
==============================================================================
--- (empty file)
+++ peps/trunk/pep-3103.txt	Mon Jun 26 18:41:13 2006
@@ -0,0 +1,449 @@
+PEP: 3103
+Title: A Switch/Case Statement
+Version: $Revision$
+Last-Modified: $Date$
+Author: guido at python.org (Guido van Rossum)
+Status: Draft
+Type: Standards Track
+Python-Version: 3.0
+Content-Type: text/x-rst
+Created: 25-Jun-2006
+Post-History: never
+
+
+Abstract
+========
+
+Python-dev has recently seen a flurry of discussion on adding a switch
+statement.  In this PEP I'm trying to extract my own preferences from
+the smorgasboard of proposals, discussing alternatives and explaining
+my choices where I can.  I'll also indicate how strongly I feel about
+alternatives I discuss.
+
+This PEP should be seen as an alternative to PEP 275.  My views are
+somewhat different from that PEP's author, but I'm grateful for the
+work done in that PEP.
+
+
+Rationale
+=========
+
+A common programming idiom is to consider an expression and do
+different things depending on its value.  This is usually done with a
+chain of if/elif tests; I'll refer to this form as the "if/elif
+chain".  There are two main motivations to want to introduce new
+syntax for this idiom:
+
+- It is repetitive: the variable and the test operator, usually '=='
+  or 'in', are repeated in each if/elif branch.
+
+- It is inefficient: when an expressaion matches the last test value
+  (or no test value at all) it is compared to each of the preceding
+  test values.
+
+Both of these complaints are relatively mild; there isn't a lot of
+readability or performance to be gained by writing this differently.
+Yet, some kind of switch statement is found in many languages and it
+is not unreasonable to expect that its addition to Python will allow
+us to write up certain code more cleanly and efficiently than before.
+
+There are forms of dispatch that are not suitable for the proposed
+switch statement; for example, when the number of cases is not
+statically known, or when it is desirable to place the code for
+different cases in different classes or files.
+
+
+Basic Syntax
+============
+
+I'm considering several variants of the syntax first proposed in PEP
+275 here.  There are lots of other possibilities, but I don't see that
+they add anything.
+
+My current preference is alternative 2.
+
+I should not that all alternatives here have the "implicit break"
+property: at the end of the suite for a particular case, the control
+flow jumps to the end of the whole switch statement.  There is no way
+to pass control from one case to another.  This in contrast to C,
+where an explicit 'break' statement is required to prevent falling
+through to the next case.
+
+In all alternatives, the else-suite is optional.  It is more Pythonic
+to use 'else' here rather than introducing a new reserved word,
+'default', as in C.
+
+Semantics are discussed in the next top-level section.
+
+Alternative 1
+-------------
+
+This is the preferred form in PEP 275::
+
+    switch EXPR:
+        case EXPR:
+            SUITE
+        case EXPR:
+            SUITE
+        ...
+        else:
+            SUITE
+
+The main downside is that the suites where all the action is are
+indented two levels deep.
+
+Alternative 2
+-------------
+
+This is Fredrik Lundh's preferred form; it differs by not indenting
+the cases::
+
+    switch EXPR:
+    case EXPR:
+        SUITE
+    case EXPR:
+        SUITE
+    ....
+    else:
+        SUITE
+
+Alternative 3
+-------------
+
+This is the same as alternative 2 but leaves out the colon after the
+switch::
+
+    switch EXPR
+    case EXPR:
+        SUITE
+    case EXPR:
+        SUITE
+    ....
+    else:
+        SUITE
+
+The hope of this alternative is that is will upset the auto-indent
+logic of the average Python-aware text editor less.  But it looks
+strange to me.
+
+Alternative 4
+-------------
+
+This leaves out the 'case' keyword on the basis that it is redundant::
+
+    switch EXPR:
+        EXPR:
+            SUITE
+        EXPR:
+            SUITE
+        ...
+        else:
+            SUITE
+
+Unfortunately now we are forced to indent the case expressions,
+because otherwise (at least in the absence of an 'else' keyword) the
+parser would have a hard time distinguishing between an unindented
+case expression (which continues the switch statement) or an unrelated
+statement that starts like an expression (such as an assignment or a
+procedure call).  The parser is not smart enough to backtrack once it
+sees the colon.  This is my least favorite alternative.
+
+
+Extended Syntax
+===============
+
+There is one additional concern that needs to be addressed
+syntactically.  Often two or more values need to be treated the same.
+In C, this done by writing multiple case labels together without any
+code between them.  The "fall through" semantics then mean that these
+are all handled by the same code.  Since the Python switch will not
+have fall-through semantics (which have yet to find a champion) we
+need another solution.  Here are some alternatives.
+
+Alternative A
+-------------
+
+Use::
+
+    case EXPR:
+
+to match on a single expression; use::
+
+    case EXPR, EXPR, ...:
+
+to match on mulltiple expressions.  The is interpreted so that if EXPR
+is a parenthesized tuple or another expression whose value is a tuple,
+the switch expression must equal that tuple, not one of its elements.
+This means that we cannot use a variable to indicate multiple cases.
+While this is also true in C's switch statement, it is a relatively
+common occurrence in Python (see for example sre_compile.py).
+
+Alternative B
+-------------
+
+Use::
+
+    case EXPR:
+
+to match on a single expression; use::
+
+    case in EXPR_LIST:
+
+to match on multiple expressions.  If EXPR_LIST is a single
+expression, the 'in' forces its interpretation as an iterable (or
+something supporting __contains__, in a minority semantics
+alternative).  If it is multiple expressions, each of those is
+considered for a match.
+
+Alternative C
+-------------
+
+Use::
+
+    case EXPR:
+
+to match on a single expression; use::
+
+    case EXPR, EXPR, ...:
+
+to match on multiple expressions (as in alternative A); and use::
+
+    case *EXPR:
+
+to match on the elements of an expression whose value is an iterable.
+The latter two cases can be combined, so that the true syntax is more
+like this::
+
+    case [*]EXPR, [*]EXPR, ...:
+
+Note that the * notation is similar to the use of prefix * already in
+use for variable-length parameter lists and for passing computed
+argument lists, and often proposed for value-unpacking (e.g.  "a, b,
+*c = X" as an alternative to "(a, b), c = X[:2], X[2:]").
+
+Alternative D
+-------------
+
+This is a mixture of alternatives B and C; the syntax is like
+alternative B but instead of the 'in' keyword it uses '*'.  This is
+more limited, but still allows the same flexibility.  It uses::
+
+    case EXPR:
+
+to match on a single expression and::
+
+    case *EXPR:
+
+to match on the elements of an iterable.  If one wants to specify
+multiple matches in one case, one can write this::
+
+    case *(EXPR, EXPR, ...):
+
+or perhaps this (although it's a bit strange because the relative
+priority of '*' and ',' is different than elsewhere)::
+
+    case * EXPR, EXPR, ...:
+
+Discussion
+----------
+
+Alternatives B, C and D are motivated by the desire to specify
+multiple cases with the same treatment using a variable representing a
+set (usually a tuple) rather than spelling them out.  The motivation
+for this is usually that if one has several switches over the same set
+of cases it's a shame to have to spell out all the alternatives each
+time.  An additional motivation is to be able to specify *ranges* to
+be matched easily and efficiently, similar to Pascal's "1..1000:"
+notation.  At the same time we want to prevent the kind of mistake
+that is common in exception handling (and which will be addressed in
+Python 3000 by changing the syntax of the except clause): writing
+"case 1, 2:" where "case (1, 2):" was meant, or vice versa.
+
+The case could be made that the need is insufficient for the added
+complexity; C doesn't have a way to express ranges either, and it's
+used a lot more than Pascal these days.  Also, if a dispatch method
+based on dict lookup is chosen as the semantics, large ranges could be
+inefficient (consider range(1, sys.maxint)).
+
+All in all my preferences are (in descending preference) B, A, D', C
+where D' is D without the third possibility.
+
+
+Semantics
+=========
+
+There are several issues to review before we can choose the right
+semantics.
+
+If/Elif Chain vs. Dict-based Dispatch
+-------------------------------------
+
+There are two main schools of thought about the switch statement's
+semantics.  School I wants to define the switch statement in term of
+an equivalent if/elif chain.  School II prefers to think of it as a
+dispatch on a precomputed dictionary.
+
+The difference is mainly important when either the switch expression
+or one of the case expressions is not hashable; school I wants this to
+be handled as it would be by an if/elif chain (i.e. hashability of the
+expressions involved doesn't matter) while school II is willing to say
+that the switch expression and all the case expressions must be
+hashable if a switch is to be used; otherwise the user should have
+written an if/elif chain.
+
+There's also a difference of opinion regarding the treatment of
+duplicate cases (i.e. two or more cases with the same match
+expression).  School I wants to treat this the same is an if/elif
+chain would treat it (i.e. the first match wins and the code for the
+second match is silently unreachable); school II generally wants this
+to be an error at the time the switch is frozen.
+
+There's also a school III which states that the definition of a switch
+statement should be in terms of an equivalent if/elif chain, with the
+exception that all the expressions must be hashable.
+
+School I believes that the if/elif chain is the only reasonably,
+surprise-free of defining switch semantics, and that optimizations as
+suggested by PEP 275's Solution 1 are sufficient to make most common
+uses fast.
+
+School II sees nothing but trouble in that approach: in an if/elif
+chain, the test "x == y" might well be comparing two unhashable values
+(e.g. two lists); even "x == 1" could be comparing a user-defined
+class instance that is not hashable but happens to define equality to
+integers.  Worse, the hash function might have a bug or a side effect;
+if we generate code that believes the hash, a buggy hash might
+generate an incorrect match, and if we generate code that catches
+errors in the hash to fall back on an if/elif chain, we might hide
+genuine bugs.  In addition, school II sees little value in allowing
+cases involving unhashable values; after all if the user expects such
+values, they can just as easily write an if/elif chain.  School II
+also doesn't believe that it's fair to allow dead code due to
+overlappin cases to occur unflagged, when the dict-based dispatch
+implementation makes it so easy to trap this.
+
+School III admits the problems with making hash() optional, but still
+believes that the true semantics should be defined by an if/elif chain
+even if the implementation should be allowed to use dict-based
+dispatch as an optimization.  This means that duplicate cases must be
+resolved by always choosing the first case, making the second case
+undiagnosed dead code.
+
+Personally, I'm in school II: I believe that the dict-based dispatch
+is the one true implementation for switch statements and that we
+should face the limitiations and benefits up front.
+
+When to Freeze the Dispatch Dict
+--------------------------------
+
+For the supporters of school II (dict-based dispatch), the next big
+dividing issue is when to create the dict used for switching.  I call
+this "freezing the dict".
+
+The main problem that makes this interesting is the observation that
+Python doesn't have named compile-time constants.  What is
+conceptually a constant, such as re.IGNORECASE, is a variable to the
+compiler, and there's nothing to stop crooked code from modifying its
+value.
+
+Option 1
+''''''''
+
+The most limiting option is to freeze the dict in the compiler.  This
+would require that the case expressions are all literals or
+compile-time expressions involving only literals and operators whose
+semantics are known to the compiler, since with the current state of
+Python's dynamic semantics and single-module compilation, there is no
+hope for the compiler to know with sufficient certainty the values of
+any variables occurring in such expressions.  This is widely though
+not universally considered too restrictive.
+
+Raymond Hettinger is the main advocate of this approach.  He proposes
+a syntax where only a single literal of certain types is allowed as
+the case expression.  It has the advantage of being unambiguous and
+easy to implement.
+
+My may complaint about this is that by disallowing "named constants"
+we force programmers to give up good habits.  Named constants are
+introduced in most languages to solve the problem of "magic numbers"
+occurring in the source code.  For example, sys.maxint is a lot more
+readable than 2147483647.  Raymond proposes to use string literals
+instead of named "enums", observing that the string literal's content
+can be the name that the constant would otherwise have.  Thus, we
+could write "case 'IGNORECASE':" instead of "case re.IGNORECASE:".
+However, if there is a spelling error in the string literal, the case
+will silently be ignored, and who knows when the bug is detected. If
+there is a spelling error in a NAME, however, the error will be caught
+as soon as it is evaluated.  Also, sometimes the constants are
+externally defined (e.g. when parsing an file format like JPEG) and we
+can't easily choose appropriate string values.  Using an explicit
+mappping dict sounds like a poor hack.
+
+Option 2
+''''''''
+
+The oldest proposal to deal with this is to freeze the dispatch dict
+the first time the switch is executed.  At this point we can assume
+that all the named "constants" (constant in the programmer's mind,
+though not to the compiler) used as case expressions are defined --
+otherwise an if/elif chain would have little chance of success either.
+Assuming the switch will be executed many times, doing some extra work
+the first time pays back quickly by very quick dispatch times later.
+
+A mostly theoretical objection to this option is that there is no
+obvious object where the dispatch dict can be stored.  It can't be
+stored on the code object, which is supposed to be immutable; it can't
+be stored on the function object, since many function objects may be
+created for the same function (e.g. for nested functions).  In
+practice, I'm sure that something can be found; it could be stored in
+a section of the code object that's not considered when comparing two
+code objects or when pickling or marshalling a code object; or all
+switches could be stored in a dict indexed by weak references to code
+objects.
+
+Another objection is that the first-use rule allows obfuscated code
+like this::
+
+    def foo(x, y):
+        switch x:
+        case y: 
+            print 42
+
+To the untrained eye (not familiar with Python) this code would be
+equivalent to this::
+
+    def foo(x, y):
+        if x == y:
+            print 42
+
+but that's not what it does (unless it is always called with the same
+value as the second argument).  This has been addressed by suggesting
+that the case expressions should not be allowed to reference local
+variables.  But this is somewhat arbitrary.
+
+A final objection is that in a multi-threaded application, the
+first-use rule requires intricate locking in order to guarantee the
+correct semantics.  (The first-use rule suggests a promise that side
+effects of case expressions are incurred exactly once.)
+
+Option 3
+''''''''
+
+TBD
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End:


More information about the Python-checkins mailing list