statements in control structures (Re: Conditional Expressions don't solve the problem)

Thu Oct 18 19:31:48 EDT 2001

Huaiyu Zhu:
>I picked a few files that have more than a few break statements, and
>searched for break.  Almost all of them are in this style:
>
>while 1:
>    x = something
>    if not x: break
>    ...

which in Python 2.2 can in many cases be rewritten

for x in iter(something, None):
    ...

>Following are a few examples of while-loops.  I don't think I'll spend more
>time on this.

I'm hoping you'll write this up as a PEP because:
  1) I don't channel Guido
  2) even if he says no, it will be a reference for future discussions
      on this topic.

>Anybody can do their own search if they want real life
>examples.

But then you have nothing to counter someone saying "I scanned
the standard library but couldn't find any place where this proposal
is helpful in real life."  :)

>  Also I'm not sure how to search for possible places where nested
>if-else ccould be changed to flat elif.

Look for "elif" followed by an "elif" which is indented by
another level.  The code I posted earlier should be a good base
for this search.

Also, have you looked for the many places that would be *worse* by the
flexibility to make this choice?  I point out one below.

>Doing these examples, I got a clearer view on why the change would be good:
>The new syntax allows one to say
>        " do ... under this condition ... "

There are times when it's shorter.  But you skip the possibility of
using an iterator.

>instead of the more twisted logic of
>        " do ..., but if not this, change ... "
>It also displays the loop condition prominently, instead of displaying a
>dummy "while 1" and bury the substance inside.

"while 1" is considered ugly by quite a few people.  But I argue
that; multiple; statements; also; buries; the; substance; inside.

>Here are 6 examples [Disclaimer: It's about half an hour quick hack
>including the searching.  Almost mechanical change.  No guarantee I really
>understand the original code before changing them. No checking, debugging
or
>whatsoever, etc.]

Understood.  And there are bugs.  :)

>./Demo/pdist/makechangelog.py:
>Change to:
>
>    while file = getnextfile(f); file:
>        revs = []
>        while rev = getnextrev(f, file); rev:
>            revs.append(rev)
>        if revs:
>            allrevs[len(allrevs):] = revs
   ...

>changed to
>    while line = f.readline(); line:
>        if startprog.match(line) >= 0:
>            file = startprog.group(1)
>            while line = f.readline(); line[:10] != '-'*10:
>                if not line: return None
>                if line[:10] == '='*10: return None
>            return file
>    return None

This module is really old.  It even uses regex.  I would rewrite
much of the code, to make it look more iterator based:

for file in getnextfile(f):
    revs = [x for x in getnextrev(f, file)]
    if revs:
        allrevs.extend(revs)

def getnextfile(f):
    while 1:
        line = f.readline()
        if not line: break
        if startprog.match(line) >= 0:
            file = startprog.group(1)
            # Skip until first revision
            while 1:
                line = f.readline()
                if not line: return
                if line[:10] == '='*10: return
                if line[:10] == '-'*10: break
            yield file

I would also change getnextrev so it's an iterator.

>Lib/ftplib.py:
>--------------------------------------------(3)
>    def getmultiline(self):
>        line = self.getline()
>        if line[3:4] == '-':
>            code = line[:3]
>            while 1:
>                nextline = self.getline()
>                line = line + ('\n' + nextline)
>                if nextline[:3] == code and \
>                        nextline[3:4] != '-':
>                    break
>        return line
>
>changed to
>
>    def getmultiline(self):
>        code = None
>        lines = []
>        while line = self.getline(); lines.append(line); line[3:4] == '-':
>            if code != line[:3]: break
>            else: code = line[:3]
>        return '\n'.join(lines)

I don't understand your conversion.  I did a mechanical translation
of your code to existing Python and I get a different result.

class Test:
   def __init__(self):
      self.data = ["012- Some sort of message",
                   "012- Would go here",
                   "012  End of multiline"]
   def getline(self):
      x = self.data[0]
      del self.data[0]
      return x

class TestOld(Test):
   def getmultiline(self):
      line = self.getline()
      if line[3:4] == '-':
         code = line[:3]
         while 1:
            nextline = self.getline()
            line = line + ('\n' + nextline)
            if nextline[:3] == code and \
               nextline[3:4] != '-':
                  break
      return line

class TestNew(Test):
   def getmultiline(self):
      code = None
      lines = []
      while 1:
         line = self.getline()
         lines.append(line)
         if not(line[3:4] == '-'):
            break
         if code != line[:3]:
            break
         else:
            code = line[:3]
      return '\n'.join(lines)

print "old"
print TestOld().getmultiline()
print "new"
print TestNew().getmultiline()

[dalke at pw600a src]$ ./python spam.py
old
012- Some sort of message
012- Would go here
012  End of multiline
new
012- Some sort of message
[dalke at pw600a src]$

>--------------------------------------------(4)
>        while 1:
>            data = conn.recv(blocksize)
>            if not data:
>                break
>            callback(data)
>
>changed to:
>
>        while data = conn.recv(blocksize); data:
>            callback(data)

Yes, this is shorter.  It's hard for me to judge if it's better
because I'm so used to the first form.  I have troubles parsing
the two different statement blocks - the first with semicolons
and the second with newlines.

I still don't like multiple items on a given line so if using
this form would be prone to write it like this.

while data = conn.recv(blocksize); \
      data:
    callback(data)

Which is ugly and little shorter than the existing code.

>./Lib/mimify.py:

Writing the full function in your form

def mime_decode(line):
  newline = ""
  pos = 0
>    while res = mime_code.search(line, pos); res is not None:
>        newline = newline + line[pos:res.start(0)] + \
>                  chr(int(res.group(1), 16))
>        pos = res.end(0)
  return newline

Using the (experimental) scanner in sre yields one fewer line:

>>> import mimify
>>> def my_decode(s):
...   t = ""
...   p = 0
...   for m in iter(mimify.mime_code.scanner(s).search, None):
...     t = t + s[p:m.start(0)] + chr(int(m.group(1), 16))
...     p = m.end(0)
...   return t + s[p:]
...
>>> my_decode("=41ndrew =44alke")
'Andrew Dalke'
>>> mimify.mime_decode("=41ndrew =44alke")
'Andrew Dalke'
>>>

Yes, I'm not playing fair here by using an undocumented feature.

/F?  When's this going to be non-experimental?

>    # read header
>    hfile = HeaderFile(ifile)
>    while 1:
>        line = hfile.readline()
>        if not line:
>            return

>Changed to
>
>    # read header
>    hfile = HeaderFile(ifile)
>    while line = hfile.readline(); line:
>        if prefix and line.startswith(prefix):
>            line = line[len(prefix):]
>            pref = prefix
>        else:
>            pref = ''

Your translation is incorrect.  The original is
    if not line:
      return

while yours translates to
    if not line:
      break

so to use your construct you'll need to stick an

  else:
    return

at the end.  This makes it longer.

>        line = mime_decode_header(line)
>        if qp.match(line):
>            quoted_printable = 1
>        elif decode_base64 and base64_re.match(line):
>            is_base64 = 1
>        else

You skipped converting the first two lines into

        if line = mime_decode_header(line); qp.match(line):

Why?  What's the guidance of when to use one form over another?
I prefer the Python philosophy where there should be one preferred
way to do something.  This makes the code easier to understand
and maintain.  Your proposal would shatter that, because there
would be no prefered way.

My prefered way is no semicolon separated statements.

>        if mp_res = mp.match(line); mp_res:
>            multipart = '--' + mp_res.group(1)
>        if he.match(line):
>            break

Yes, this is shorter.  I still argue it's more confusing because
it looses the guidance of the vertical arrangement of code.

It's also possible to solve this specific case with

import operator
class check:
  def __init__(self, result):
    self.result = result
  def __nonzero__(self):
    return operator.truth(result)
check = check()

if check(mp_match(line)):
  multipart = '--' + check.result.group(1)

Indeed, this is closer to how the regex module works.  But I
don't like this style.

So in a couple cases your proposal reduces the number of lines
of code by a smidgeon, with the tradeoff of doing several things
on one line.  In other cases it doesn't help, and hinders things
by making it harder to figure out where the bug is.

Compare this to my experience with list comprehensions.  I
didn't like it because that added a new way to do things I
had already been doing, and I was worried that it would lead
to increased confusing.  It hasn't, because list comprehensions
ended up replacing a very common phrase

x = []
for a in data:
  x.append(f(a))

I now prefer list comprehensions.  I can't imagine ever
prefering semicolon separated statements on the same line.

                    Andrew
                    dalke at dalkescientific.com