[Tutor] Text processing and functional programming?

Tue Aug 12 17:48:55 EDT 2003

On Tue, 12 Aug 2003, Kirk Bailey wrote:

> > My one, major beef was the author's insistance on functional
> > programming.  Don't get me wrong I enjoy the style, but I dislike
> > authors who push a programming style when the book is not about style.
>
> I can see your point, and certainly not the place for it.

Hi everyone,

But is there a particular example we can talk about where the author's
"functional" approach is not appropriate?

I thought that David Mertz's articles have been pretty good.  Functional
programming isn't just about style -- in some cases, the functional
approach can avoid silly problems that plague a traditional approach.

But let's take one of his examples from:

    http://gnosis.cx/TPiP/chap3.txt

though and talk about it.  The following code is part of the problem of
making a text block "flush left".  One part of the problem tries to find
the minimum indent level of all the lines in a piece of text.

###
from re import findall,sub
indent = lambda s: reduce(min,map(len,findall('(?m)^ *(?=\S)',s)))
flush_left = lambda s: sub('(?m)^ {%d}' % indent(s),'',s)

if __name__ == '__main__':
    import sys
    print flush_left(sys.stdin.read())
###

Hmmm.  Yikes.  Ok, I do agree with you here.  The statement:

    indent = lambda s: reduce(min,map(len,findall('(?m)^ *(?=\S)',s)))

is overloaded.  *grin*

For those who haven't seen it before: 'lambda' is an operator that creates
simple functions.  Let's rewrite the code above slightly so it's slightly
more familiar looking:

###
def indent(s):
    return reduce(min,map(len,findall('(?m)^ *(?=\S)',s)))
###

This should have the exact same meaning as the lambda statement.  But the
'lambda' here doesn't necessarily make a function more "functional":  we
really do need to look within the heart of the function, and that's where
all that 'reduce' and 'map' stuff comes into play.

Now those things are all functional, and people might not like them too
much unless they've been exposed to functional concepts.  For reference, a
traditional approach to the above might use an explicit for loop:

###
def indent(s):
    results = findall('(?m)^ *(?=\S)',s)
    minimum = len(results[0])
    for r in results:
        if len(r) < minimum:
            minimum = len(r)
    return minimum
###

This explicit looping approach works, but it mixes together applying a
function across a sequence with finding the minimum of the sequence.
There's more potential for programmer error with this approach.

A "functional" approach concentrates on applying functions on our
sequences.  With it, we can often avoid using explicit loops or any
assignments.

They're often easier to debug: if something is going wrong in:

    reduce(min, map(len, findall('(?m)^ *(?=\S)', s)))

then there are at least two ways we can debug this: we can look at the
results of:

    findall('(?m)^ *(?=\S)', s)
    map(len, findall('(?m)^ *(?=\S)', s))

That is, we can take a look at the subexpressions.  If we're getting
unexpected results, then we know exactly where the problem starts.  In
contrast, debugging the traditional approach might involve a more laborous
'tracing' process where we step, statement-by-statement, through the
program.

Furthermore, the functional approach doesn't necessarily have to look
evil; I think we can improve on it quite a bit!  The expression:

    reduce(min,
           map(len, findall('(?m)^ *(?=\S)', s)))

is really non-optimal, because min() knows how to deal with lists.  So we
can simplify the expression down to:

    min(map(len, findall('(?m)^ *(?=\S)', s)))

which actually reads a bit better: "Give us the minimum out of the lengths
of all the indents".  It's just as "functional" as the original code.

Now that we've simplified it a little more, let's test it:

###
>>> def indent(s):
...     return min(map(len,findall('(?m)^ *(?=\S)',s)))
...
>>> indent(' hello\n   this is a test')
1
>>> indent('    hello\n    this is a test\n        ')
4
>>> indent('    hello\n    this is a test\n  \n')
4
###

Hey, look there!  Is that last case a problem or a bug?  Shouldn't it
return 2, since the last line has an indent of two spaces?

Well, how can we debug this?  We can look back at the definition of
indent():

    min(map(len,findall('(?m)^ *(?=\S)',s)))

and assuming that the min() and map()ping steps are working ok, we can
take a look at the findall() subexpression:

###
>>> s = '    hello\n    this is a test\n  \n'
>>> findall('(?m)^ *(?=\S)',s)
['    ', '    ']
###

And we can verify that the problem isn't really a problem: the regular
expression only pays attention to lines that aren't empty, so the indent()
function is doing the right job.

We were able to check this casually, without too much pain, precisely
because the function was written in a functional style: it let us pinpoint
our search down to a particular subexpression.

Going back to the other, non-functional approach to indent():

###
def indent(s):
    results = findall('(?m)^ *(?=\S)',s)
    minimum = len(results[0])
    for r in results:
        if len(r) < minimum:
            minimum = len(r)
    return minimum
###

debugging something like this might involve more work.  Did we mess up our
loop, for example?  Or is the assignment not working the way we expect?
There are simply more factors to consider here, and pinpointing the
problem now involves stepping through each statement to find out at what
point things start turning sour.

Of course, it's not difficult in this example to pinpoint our confusion
down to the findall() regular expression statement (since it's the very
first statement... *grin*), but I hope the point about the advantage of
debugging a functional program is more clear.

Please feel free to ask more questions about this.  Good luck to you!