for / while else doesn't make sense

Steven D'Aprano steve at pearwood.info
Sun Jun 5 02:35:39 EDT 2016


On Sun, 5 Jun 2016 01:29 pm, Lawrence D’Oliveiro wrote:

> On Saturday, June 4, 2016 at 11:37:18 PM UTC+12, Ned Batchelder wrote:
>> On Friday, June 3, 2016 at 11:43:33 PM UTC-4, Lawrence D’Oliveiro wrote:
>> > On Saturday, June 4, 2016 at 3:00:36 PM UTC+12, Steven D'Aprano wrote:
>> > > You can exit a loop because you have run out of items to process, or
>> > > you can exit the loop because a certain condition has been met.
>> > 
>> > But why should they be expressed differently?
>> > 
>> >     item_iter = iter(items)
>> >     while True :
>> >         item = next(item_iter, None)
>> >         if item == None :
>> >             break
>> >         if is_what_i_want(item) :
>> >             break
>> >     #end while
>> 
>> Do you actually write loops like this?
> 
> Is that a non-trolling question? Yes. All the time.

Really? Well, you'd fail my code review, because that code is broken. If
items contains None, your loop will silently end early. That's a bug.


>> If this appeared in a code review, first we'd have a conversation about
>> what this code was meant to do ...
> 
> I would hope not.

Clearly. Nevertheless, its a conversation that needs to be had.


>> ...and then I would ask, "Why aren't you using a for loop?"
> 
> ... and then I would ask, “Didn’t you read my previous postings where I
> pointed out the issues with them?”

I don't think that very many people would agree with you or consider them
problems at all. They're more like features than problems. Your objections
to for-loops feel kind of like "I don't like bread knives because they make
it too easy to slice bread".

Okay, you don't like for-loops, because they make looping a fixed number of
times with an optional early exit too much of a "cognitive burden" for you.
You have my sympathy, but nobody else I've come across in nearly two
decades of Python programming finds them a cognitive burden.


> Here <https://en.wikibooks.org/wiki/Python_Programming/Databases> is
> another example: see the section “Looping on Field Breaks”. 

That section was written by you and is not independent confirmation that
others agree with your issues with for-loops.


> A while-True scales gracefully to complex situations like that.

Graceful like a hippopotamus.

I don't know that the situation is complex, your description is pretty clear
and to the point:

    Consider the following scenario: your sales company database has
    a table of employees, and also a table of sales made by each
    employee. You want to loop over these sale entries, and produce
    some per-employee statistics.

but the while loop you have certainly is complex. If I understand your
intent correctly, then I think this is both more elegant and likely faster
than the while loop you use:


# Beware of bugs in the following code: 
# I have only proven it is correct, I haven't tested it.
rows = db_iter(
    db = db,
    cmd =
        "select employees.name, sales.amount, sales.date from"
        " employees left join sales on employees.id = sales.employee_id"
        " order by employees.name, sales.date"
    )
default = {'total sales': 0.0,
           'number of sales': 0,
           'earliest date': None,
           'latest date': None}
prev_employee_name = None
stats = {}
for (employee_name, amount, date) in rows:
    if (employee_name != prev_employee_name 
            and prev_employee_name is not None):
        # Print the previous employee's stats
        report(prev_employee_name, stats)
        # and prepare for the next employee.
        previous_employee_name = employee_name
        stats = default.copy()
    stats['total sales'] += amount
    stats['number of sales'] += 1
    if stats['earliest date'] is None:
        stats['earliest date'] = date
    stats['latest date'] = date

if prev_employee_name is not None:
    report(prev_employee_name, stats)


No breaks needed at all, which makes it much more understandable: you know
instantly from looking at the code that it processes every record exactly
once, then exits.

But it is a *tiny* bit ugly, due to the need to print the last employee's
statistics after the loop is completed. We can fix that in two ways:

(1) Give up the requirement to print each employee's stats as they are
completed, and print them all at the end; or

(2) Put a sentinel at the end of rows.


The first may not be suitable for extremely large data sets, but it is
especially elegant:


rows = db_iter( ... # as above )
default = {'total sales': 0.0, 
           'number of sales': 0,
           'earliest date': None,
           'latest date': None}
stats = {}
for (employee_name, amount, date) in rows:
    record = stats.setdefault(employee_name, default.copy())
    stats['total sales'] += amount
    stats['number of sales'] += 1
    if stats['earliest date'] is None:
        stats['earliest date'] = date
    stats['latest date'] = date
for employee_name in stats:
    report(employee_name, stats[employee_name])


As you now have all the statistics available, you can look for
under-performing or over-performing sales people, run comparisons between
staff, etc.


Solution (2) using a sentinel gets rid of the need to print anything outside
of the loop by simply ensuring that the very last record is a meaningless
sentinel that can be ignored:

from itertools import chain
rows = db_iter( ... # as above )
default = {'total sales': 0.0, 
           'number of sales': 0,
           'earliest date': None,
           'latest date': None}
prev_employee_name = None
stats = {}
for (employee_name, amount, date) in chain(rows, ('', 0, None)):
    if (employee_name != prev_employee_name 
            and prev_employee_name is not None):
        # Print the previous employee's stats
        report(prev_employee_name, stats)
        # and prepare for the next employee.
        previous_employee_name = employee_name
        stats = default.copy()
    stats['total sales'] += amount
    stats['number of sales'] += 1
    if stats['earliest date'] is None:
        stats['earliest date'] = date
    stats['latest date'] = date


Again, there are no breaks needed, so you know that every record is
processed exactly once, and all but the last (the sentinel) is printed.


-- 
Steven




More information about the Python-list mailing list