PEP: statements in control structures (Re: Conditional Expressions don't solve the problem)

Huaiyu Zhu huaiyu at gauss.almadan.ibm.com
Wed Oct 17 17:37:46 EDT 2001


I've been writing this for several evenings.  It's not polished, but IMHO it
solves most of the problems discussed in this thread.  So here it goes. :-)


                PEP: Statements in Control Structures


1. INTRODUCTION:
    
    One long-standing complaint about Python syntax is that it distinguishes
    expressions from statements and does not allow statements in the
    condition part of flow control structures, so it is not allowed to write
            
            while x = next():  process(x)
            if x = next():     process(x)
            
    This is to avoid troublesome bugs like
            
            if x = 0:               ...
            
    when the programmer actually wanted 
            
            if x == 0:              ...
            
    However, this restriciton of not allowing statements in conditions also
    has its own costs, mainly in increased verbosity which sometimes in
    themselves lead to other subtle bugs.  This issue is not an artificial
    one because the current control-flow structures do not represent the
    most general case naturally.
    
    In this proposal we present an extended syntax of control structures
    that allow statements before conditions, without the risk of problems
    associated with mixing statements with expressions.  The new syntax is

            if ( stmt ; )* expr : suite 
            ( elif ( stmt ; )* expr : suite )* 
            [ else : suite ]

            while ( stmt ; )* expr : suite
            [ else : suite ]

    Compared with Python's current syntax

            if expr : suite 
            ( elif expr : suite )* 
            [ else : suite ]

            while expr : suite
            [ else : suite ]
    
    the new syntax allows simple statements in control structures just
    before the conditions, separated by ";".  

    Essentially the same syntax was proposed by Kevin Digweed [1] in 1999,
    which received some favorable comments but was subsequently lost in a
    discussion with over-generalization (see the relevant thread).  This
    author, unaware of the earlier discussion, proposed it independently in
    2000 [2].  Reference [1] was recently mentioned by Hamish Lawson [3].



2. PROPOSAL:

    The extension allows zero or more simple statements separated by ";" to
    be placed between the keywords "while", "if" and "elif" and their
    corresponding conditions.  These statements are executed in sequence at
    the point just before the condition expression is to be evaluated.

    In the following, 

        statements := (simple_statement ";" )* simple_statement

    In other words, they are what can be written in one line separated by
    ";" in the current syntax.


    2.1. The if-elif-else structure is extended to the following
            
            "if" [ statements ";" ] expression ":"       statements
            ( "elif" [ statements ";" ] expression ":"   statements )*
            [ "else:" statements ]
            
    
        The semantics of 
        
            if stmts1; expr1:
                stmts2
            elif stmts3; expr2:
                stmts4
            elif stmts5; expr3:
                stmts6
            else:
                stmts7
        
        is equivalent to the semantics of
        
            stmts1
            if expr1:
                stmts2
            else:
                stmts3
                if expr2:
                    stmts4
                else:
                    stmts5
                    if expr3:
                        stmts6
                    else:
                        stmts7


    2.2.  The while-else structure is extended to the following

            "while" [ statements ";" ] expression ":" statements
            [ "else:" statements ]

        The semantics of

            while stmts1; expr1:
                stmts2
                if expr2: break
                stmts3
            else:
                stmts4

        is equivalent to the semantics of

            __hidden_variable = 0
            while 1:
                stmts1;
                if not expr1: break
                stmts2
                if expr2:
                    __hidden_variable = 1
                    break
                stmts3
            if not __hidden_variable:
                stmts4
            del __hidden_variable

        where __hidden_variable is a variable not used in this block.



3. RATIONALE:

    3.1.  General looping structure.

        A general looping structure, commonly known as loop-and-half, looks like
        
            A
            loop:
                B
                if not C: break
                D
            E

        If B is empty, this can be represented in current Python as

            A
            while C:
                D
            E

        The new syntax allows the more general case even when B is not empty

            A
            while B; C:
                D
            E

        Putting B between "while" and the condition is not just syntactical
        sugar, as will be shown in the following.


    3.2. Better break-else interaction:

        Python allows an else-clause for the while statement.  The naive way
        of writing the general loop-and-half in current Python interferes
        with the else clause.  For example,

            while x = next(); not x.is_end:
                y = process(x)
                if y.is_what_we_are_looking_for(): break
            else:
                raise "not found"
            
        cannot be written in this naive version:
            
            while 1:
                x = next()
                if x.is_end: break
                y = process(x)
                if y.is_what_we_are_looking_for(): break
            else:
                raise "not found"
        
        This is because there are two breaks that have different semantical
        meanings.  The fully equivalent version in current syntax has to use
        one extra variable to keep track of the breaks that affect the else
            
            __hidden_variable = 0
            while 1:
                x = next()
                if x.is_end: break
                y = process(x)
                if y.is_what_we_are_looking_for(): 
                    __hidden_variable = 1
                    break
            if not __hidden_variable:
                raise "not found"
            del __hidden_variable

        The improvement of the new syntax is quite obvious.         


    3.3. Flatter conditional structures:
            
        A general nested "if-else-and-a-half" structure is like
            
            A
            if B:
                C
            else:
                D
                if E:
                    F
                else:
                    G
                    if H:
                        I
                    else:
                        ...
        
        which can be written in the new syntax as
        
            if A; B:
                C
            elif D; E:
                F
            elif G; H:
                I
            else:
                ...
            
        The advantage of this syntax pattern is similar to that of "elif"
        itself, namely to transform a nested branching structure into a flat
        branching structure.  Using flat structure in place of nested
        structure is in keep with one of the good tradition of Python.
        


4. EXAMPLES:

    4.1. Action needed before condition:
        
            while x = get_next(); x:
               whatever(x)

        
    4.2. Condition does not need to be a method of an object in assignment:
        
            while line = readline(); 'o' in line:
                line = process(line)
                if 'e' in line: break
            else:
                print "never met break"


    4.3. Has similar power to C's for statement
        
            for (start; action, end; incr) {
                do_something;
                if (cond) break;
                do_other;
            }
        
        can be written as
        
            start   
            while action; not end:
                do_something
                if cond: break
                do_other
                incr


    4.4. More complex example:
            
            if x = dict[a]; x:              proc1(x)
            elif x = next(x); x.ok():       proc2(x)
            elif x.change(); property(x):   proc3(x)
            ...

        The equivalent in the current syntax is not flat:
            
            x = dict[a]
            if x:               proc1(x)
            else:
                x = next(x)
                if x.ok():          proc2(x)
                else:
                    x.change()
                    if property(x):     proc3(x)
                    else:
                        ...
            
        Alternatively, it requires at least a two level hack with "while":

            while 1:
                x = dict[a]
                if x:
                    proc1(x)
                    break
                x = next(x)
                if x.ok():
                    proc2(x)
                    break
                x.change()
                if property(x):
                    proc3(x)
                    break
                ...
                break

    It is seen that the new syntax remove substantial amount of clutter,
    thereby increasing readability and expressiveness.  



5. ISSUES:

    5.1. Syntax errors:

        This structure is safe against single typing errors:

            - missing colon would be detected at newline because of keywords

            - missing last expression will be detected at the colon, because
              the condition must be an expression

            - mistype = for == in expression will be detected

            - mistype == for = in statement will be detected

            - mistype : for ; will be detected as missing :

            - mistype ; for : will be detected as mutiple :


    5.2. Obfuscation:

        The new syntax does not diminish the distinction between statements
        and expressions.
        
        Specifically, the structure

            if S1; S2; E: S

        is built upon statements and expressions according to the syntax of
        "if".  It does not mean that (S1; S2; E) itself becomes a magical
        super-expression.  The same is true for "elif" and "while".

        Consequently, without changing the syntax of "for", the following is
        meaningless and not allowed

            for S1; S2; a in B: C

        Since the change is only in the syntax of "if", "elif" and "while",
        not in the fundamentals of expressions and statements, there is not
        much more chance of obfuscation than existing syntax.


    5.3. Compatibility:

        This extention is fully backward compatible, because the extended
        syntax is currently invalid syntax.


    5.4. Generality:

        Guido in comment about Kevin's proposal mentioned [4] that this is
        not general enough to allow short circuit conditions:

            while (x = f(); x) and (x.y = g(); x.y):
                 "whatever"

        which is equivalent to the following (assuming no training "else"):

            while 1:
                x = f()
                if not x: break
                else:
                    x.y = g()
                    if not x.y: break
                "whatever"

        With the extension of "if" and "elif" this could be written in
        quite readable form:

            while 1:
                if x = f(); not x: break
                elif x.y = g(); not x.y: break

        However, there is indeed a complication when there is both a "break"
        and an "else".  A general solution would either require allowing
        super expressions like (S1; S2; E) or a new keyword, such as
        "until", or two flavors of "break".  It is unclear whether such
        situations are important enough for such more radical changes.


        Note that the similar problem with "if" is already solved:
            
            if ((x = f.readline(); x) and
                (y = f.readline(); y)):
                print x, y

        can be written as 
            
            if x = f.readline(); not x: pass
            elif y = f.readline(); not y: pass
            else:
                print x, y
            


6. ALTERNATIVES:

    We show that the following alternatives have more problems than the
    proposed extension.

    6.1. Allowing special assignment in conditions.  For example:

                while a:=next(): process(a)

        This has many problems:

        6.1.1. It does not allow other actions before conditions, such as

                while x.get_next(): x: process(x)

            So it does not really solve the problem.  On the other hand, if
            arbitrary statement were allowed in an expression with some
            syntactical trick then it would completely blur the distinction
            between statement and expression.

        6.1.2. It does not allow for arbitrary expressions in condition, like

                while x=next(); some_property_of(x):
                    ...

            The proposal 

                if some_property_of(x:=next()):
                    ...

            is ugly, and if it were to become a general rule, it would allow
            much obfuscations

                a = f(b:=3+g(not c:=4) - c) * b(c)


    6.2. Regarding (S1; S2; S3; E) as an expression that can appear in other
        places.  This is more general than current proposal and solve one
        more problem: short-circuit condition in "while" loop with "break"
        statement and "else" clause (see 5.4 above).

        However, it appears to be too general than is necessary, and easily
        lead to obfuscations such as

                x = f((y = (x = a; x = {x.next(): x.next()}; x); y[0].next()))

        The problems associated with statements in expressions appear to far
        outweigh the benefit in the one particular example of 5.4.


    6.3. Using iterators:  In Python 2.2, it is possible to write

                for line in file: 
                    if line=='end': break
                    process_line(line)

        in place of 

                while line=file.readline(); line != 'end':
                    process_line(line)
        
        However, this does not solve the problem we are considering
        completely. It is more suitable for objects that are naturally used
        (and reused) in iterations so that it is profitable to create
        iterators for them.  It is not practical to define an iterator for
        every action that might go before a condition:

                while char=file.readline()[3]; char != 'x':
                    process(char)
               
                while string = raw_input(prompt): not string.startswith("n"):
                    process(string)

        It does not solve the nested else-if problem, either.


    6.4. Conditional expression, like (if a then b else c).  This solves a
         completely different problem.   It is not related to this proposal.


    6.5. Repeat action before and after loop:

                line = file.readline()
                while line:
                    do_something()
                    line = file.readline()

        This is a maintenance liability - it is easy to go out of sync.
        This does not apply to the "if" structure, either.


    6.6. An alternative syntax might be
            
            val = dict1[key1]; if val:        process1(val)
            else: val = dict2[key2]; if val:  process2(val)
            else: val = dict3[key3]; if val:  process3(val)
            ...
            
        This looks more consistent and logical.  One problem of putting
        keyword "if" in the middle of a line is that it is less prominent,
        although syntax-colored editors do help.

        However, similar syntax is not right for while-loops:
            
            A; while B: C

        gives the impression that A is only done once.  So this alternative
        is not tenable.



7. IMPLEMENTATION:

    I do not know enough about the possible implementation.  It does not
    appear to have fundamental difficulties.  The changes are likely to
    occur at places where statements are parsed.



8. SUMMARY
    
    The extension allowing sequence of simple statements between keywords
    "while", "if", "else" and their corresponding condition expressions
    solves several problems.  It makes code more readable in many
    situations, without risk of obfuscation in other situations.  

    It compares favorably to several alternatives, both existing and
    proposed, because, essentially, the original problem is about statements
    before condition expressions in control structures.  It is not a demand
    to blur the distinction between statements and expressions, and should
    not be solve in such a fashion.



9. REFERENCES 

[1] http://groups.google.com/groups?hl=en&selm=78naok%242ed%241%40nnrp1.
dejanews.com
[2] http://www.geocities.com/huaiyu_zhu/python/ififif.txt
[3] http://mail.python.org/pipermail/python-list/2001-October/068332.html
[4] http://groups.google.com/groups?q=g:thl2213523209d&hl=en&selm=199901271725.
MAA14695%40eric.cnri.reston.va.us

(The google references are longer than my news client allows, so they are
split to mutiple lines.)



More information about the Python-list mailing list