Software bugs aren't inevitable

Wed Sep 14 23:44:18 EDT 2005

"Steven D'Aprano" <steve at REMOVETHIScyber.com.au> wrote in message 
news:pan.2005.09.14.18.06.03.522096 at REMOVETHIScyber.com.au...
> On Wed, 14 Sep 2005 11:28:02 -0400, Terry Reedy wrote:

> But in the context of my response, I was replying to a paraphrased quote
> from somebody who apparently believes that recursion is *always* better
> than iteration. That is clearly not the case.

We agree here.  In fact, I suggested that for a given algorithm implemented 
in Python, iteration should always be faster by a small factor.

> It is a "mere implementation detail" that (for most computer systems, and
> most programming languages) stack space is at a premium and a deeply
> recursive function can run out of stack space while the heap still has
> lots of free memory. But those implementation details in the real world
> make the difference between an elegant solution that runs like a lame 
> duck
> and an practical solution that has nothing going for it except the fact
> that it works.

Which is why, in the part you snipped, I said that recursion (unlike 
iteration)  piles up stack frames unless optimized not to and that Python 
is *not* so optimized.  I will add that when an function or algorithm does 
require or use a stack, the iterative form will use heap memory instead of 
call stack memory and will put less on the stack with each repetition.  But 
your example was about time and my point then and now is about time, not 
space.

>> Recursion and iteration are two syntaxes for the same operation: 
>> repetition
>> with variation.
>
> Yes, in general anything that can be solved recursively can be solved
> iteratively. Some classes of problems lend themselves naturally to one or
> the other solution, but it is always possible (in principle at least) to
> use either.
>
>> Abstractly, these are two algorithms for the same function.  One runs in
>> exponential time because it wastefully calculates and tosses away an
>> exponential number of subvalues.  The other runs in linear time because 
>> it
>> calculates each subvalue once.  When one only wants Fib(n), and not the
>> sequence leading up to it, even this is wasteful, for large enough n, 
>> since
>> there is a third algorithm that caluculates Fib(n) directly by a simple
>> formula (something like the interger part of the golden ratio to the nth
>> power).

To expand on this a bit: seeing the problem as 'recalculating tossed-away 
subvalues' suggests the fruitful question "how do I not do that?".  The 
generic answer is memoization.  A specific answer applicable to Fib() is 
linearization or even formulaization.  This is an extreme example of 
wasteful calculation, but squeezing out waste is a common theme in 
algorithm improvement.  See below for another example.

> Yes. There are lots of algorithms that could be done, and they all have
> their pros and cons. Biset's formula, for example, is mathematically
> correct, but for large enough n, the "mere implementation detail" that
> floats have a finite precision will cause that algorithm to give 
> incorrect
> answers. For "large enough", on my system I mean n=71.

I am assuming you mean the Fib formula?  No matter.  With finite precision 
ints, typically 32 bits, fib(n) by addition overflows sooner than that.  If 
you allow extended precision ints, then allow extended precision floats 
also.

>> Now: I could (and probably someday will) write an iterative version of 
>> the
 >> exponential algorithm (using an explicit stack) that calculates each
>> subvalue exactly as many times as the recursive version of that same
>> algorithm.  And I could compare it to a recursive version of the more
>> efficient linear algorithm (such as posted by Jerzy Karczmarczuk).  And 
>> I
>> could claim that this shows hows iteration can waste time compared to
>> recursion.
>
> Of course it can. But you have to really *work* at getting the iterative
> version to be as wasteful as the obvious recursive version.

For someone who does not know how to write the iterative version, the 
'extravagent' recursive version would be infinitely faster.  But, as I 
said, I do and I would *not* have to work very hard.  I have a 10-15 line 
template (in C) sitting in a book 10 feet away.  Filling it in would be 
easy and straightforward.  If I had the template in Python on my machine, 
maybe 5 minutes.  I hope someday to write a set of such templates, with 
explanations, so that you could potentially do the same ;-).

>> But that, I admit, would be an invalid conclusion.  And that, I claim,
>> is also invalid when 'iteration' and 'recursion' are reversed, no matter
>> how often repeated in texts and articles.  The difference is between the
>> algorithms, not the differing syntactic expressions thereof.
>
> Now you go too far.

Not at all.  I was in my first response and here am talking specifically 
about exponential time wastage and the bogus claim that it is due to 
recursion or, in my reversal, iteration.  It is due to the algorithm 
difference.  The problem with this example, which I know you have read 
elsewhere, perhaps more than once, is that two factors are changed 
together, making an incomplete experimental design (see below) and the 
large observable difference is usually ascribed to the wrong factor.

Space is a different issue, which we seem to agree on.

Now for the other time wastage example promised above.  In math, a standard 
set definition method is filtration of a master set by an element predicate 
to produce a subset: a set comprehension.  In Python, we have the list 
comprehension and generator expression forms.  One can do the same with the 
powerset of a set to get a set of subsets that is a subset of the set of 
all subsets as defined by a set predicate.

Assume that powset(someset) produces an iterative powerset generator.  Then
sops = (s for s in powset(someset) if setpred(s))
is an iterative restricted subset generator, easily passed to set() or 
list().
A specific example is
ksubsets = (s for s in powset(someset) if len(s)==k).
As n = len(someset) increases, this (naive) algorithm gets increasingly 
inefficient.

Alternatively, we can generate ksubsets with the standard double recursion 
that generates exactly and only the size k subsets.  So what can we 
conclude from this example about the relative time usage of 'extravagent 
naive interation' versus 'elegant recursion'.  My main point here and 
previously is just this: nothing (or certainly not much).

To compare iteration versus recursion, one should use the same algorithm or 
same set of algorithms, with an iterative and recursive version of each. 
To compare algorithms, one should use the same implementation form or forms 
on each.  Varying one factor at a time, when possible, is basic good 
science.

To compare two factors simultaneously, one should, if possible (and here it 
is) vary each independently in a balanced design with a minimun of 2x2 = 4 
experimental setups.

Terry J. Reedy