[Tutor] decomposing a problem

Avi Gross avigross at verizon.net
Tue Dec 25 19:00:40 EST 2018


[Long enough that some should neither read nor comment on.]

Mats raised an issue that I think does relate to how to tutor people in
python.

The issue is learning how to take a PROBLEM to solve that looks massive and
find ways to look at it as a series of steps where each step can be easily
solved using available tools and techniques OR can recursively be decomposed
into smaller parts that can. Many people learn to program without learning
first how to write down several levels of requirements that spell out how
each part of the overall result needs to look and finally how each part will
be developed and tested. I worked in organizations with a division of labor
to try to get this waterfall method in place. At times I would write
higher-level architecture documents followed by Systems Engineering
documents and Developer documents and Unit Test and System Test and even
Field Support. The goal was to move from abstract to concrete so that the
actual development was mainly writing fairly small functions, often used
multiple times,  and gluing them together.

I looked back at the kind of tools used in UNIX and realize how limited they
were relative to what is easily done in languages like python especially
given a huge tool set you can import. The support for passing the output of
one program to another made it easy to build pipelines. You can do that in
python too but rarely need to.

And I claim there are many easy ways to do things even better in python.

Many UNIX tools were simple filters. One would read a file or two and pass
through some of the lines, perhaps altered, to the standard output. The next
process in the pipeline would often do the same, with a twist and sometimes
new lines might even be added. The simple tools like cat and grep and sed
and so on loosely fit the filter analogy. They worked on a line at a time,
mostly. The more flexible tools like AWK and PERL are frankly more like
Python than the simple tools.

So if you had a similar task to do in python, is there really much
difference? I claim not so much.

Python has quite a few ways to do a filter. One simple one is a list
comprehension and its relatives. Other variations are the map and filter
functions and even reduce. Among other things, they can accept a list of
lines of text and apply changes to them or just keep a subset or even
calculate a result from them.

Let me be concrete. You have a set of lines to process. You want to find all
lines that pass through a gauntlet, perhaps with changes along the way.

So assume you read an entire file (all at once at THIS point) into a list of
lines.

stuff = open(...).readlines()

Condition 1 might be to keep only lines that had some word or pattern in
them. You might have used sed or grep in the UNIX shell to specify a fixed
string or pattern to search for.

So in python, what might you do? Since stuff is a list, something like a
list comprehension can handle many such needs. For a fixed string like
"this" you can do something like this.

stuff2 = [some_function(line) for line in stuff if some_condition(line)]

The condition might be: "this" in line
Or it might be a phrase than the line ends with something.
Or it might be a regular expression type search.
Or it might be the length is long enough or the number of words short
enough. Every such condition can be some of the same things used in a UNIX
pipeline or brand new ideas not available there like does a line translate
into a set of numbers that are all prime!

And, the function applied to what is kept can be to transform it to
uppercase, or replace it with something else looked up in a dictionary and
so on. You might even be able to apply multiple filters with each step.
Python allows phrases like line.strip().upper() and conditions like: this or
(that and not something_else)

The point is a single line like the list comprehension above may already do
what a pipeline of 8 simple commands in UNIX did, and more.

Some of the other things UNIX tools did might involve taking a line and
breaking it into chunks such as at a comma or tab or space and then keeping
just the third and fifth and eighth but in reverse order. We sometimes used
commands like cut or very brief AWK scripts to do that. Again, this can be
trivial to do in python. Built in to character strings are functions that
let you split a line like the above into a list of fields on a separator and
perhaps rearrange and even rejoin them. In the above list comprehension
method, if you are expecting eight regions that are comma separated

>>> line1 = "f1,f2,f3,f4,f5,f6,f7,f8"
>>> line2 = "g1,g2,g3,g4,g5,g6,g7,g8"
>>> lines=[line1, line2]
>>> splitsville = [line.split(',') for line in lines]
>>> splitsville
[['f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8'], ['g1', 'g2', 'g3', 'g4',
'g5', 'g6', 'g7', 'g8']]
>>> items8_5_3 = [(h8, h5, h3) for (h1,h2,h3,h4,h5,h6,h7,h8) in splitsville]
>>> items8_5_3
[('f8', 'f5', 'f3'), ('g8', 'g5', 'g3')]

Or if you want them back as character with an underscore between:

>>> items8_5_3 = ['_'.join([h8, h5, h3]) for (h1,h2,h3,h4,h5,h6,h7,h8) in
splitsville]
>>> items8_5_3
['f8_f5_f3', 'g8_g5_g3']

The point is that we have oodles of little tools we can combine to solve
bigger problems, sometimes in a big complicated mess and sometimes a simple
step at a time. Not all can be easily chained the same way but then we have
a bit more complex topic like generators and queues that can be chained
together in ways even more complex than UNIX pipelines. Each generator would
only be called to produce one result when another needs it.

And I assume it might be possible to make a series of methods that are
placed in an object that extends an object type like "list" that maintain an
internal representation like a list of strings and changes it in place just
like .sort() does.

So to apply a UNIX-style pipeline may be as simple as:

mystdout = mystdin("LIST OF STRINGS TO
INITIALIZE).method1(args).method2(args)....methodn(args)

The initializer will set the current "lines" and each method will loop on
the lines and replace them with the output it wants. Perhaps the initializer
or first method may actually read all lines from stdin. Perhaps the last
method will write to stdout. All methods will effectively follow in sequence
as they massage the data but will not actually run in parallel.

And you can even write a generic method that accepts any external function
designed to accept such a list of lines and return another to replace it.

My point to Mats is that the goal is to learn to divide and conquer a
problem. Using small and well defined methods that can fit together is
great. Many things in python can be made to fit and some need work. Dumb
example is that sorting something internally returns None and not the object
itself. So you cannot chain something like object.upper().sort() any
further. You may be able to chain this:

>>> list(reversed(sorted(lines))).pop()
'f1,f2,f3,f4,f5,f6,f7,f8'

Why the odd syntax? Because the developers of python in their wisdom may not
have chosen to enhance some methods to do things another way.

Object.sort() and Object.reverse() will change the internals and return
nothing. They are not designed to be piped. If there was a method that
performed a sort AND returned the object or performed a reverse and returned
the object, then we might see:

lines.sort(show=True).reverse(show=True).pop()

Or some other similar stratagem. Then we could write a fairly complex
sequence in a pipelined mode.

-----Original Message-----
From: Tutor <tutor-bounces+avigross=verizon.net at python.org> On Behalf Of
Mats Wichmann
Sent: Tuesday, December 25, 2018 11:04 AM
To: tutor at python.org
Subject: Re: [Tutor] look back comprehensively

On 12/24/18 5:45 PM, Avi Gross wrote:


> As for the UNIX tools, one nice thing about them was using them in a 
> pipeline where each step made some modification and often that merely 
> allowed the next step to modify that. The solution did not depend on 
> one tool doing everything.

I know we're wondering off topic here, but I miss the days when this
philosophy was more prevalent - "do one thing well" and be prepared to pass
your results on in a way that a different tool could potentially consume,
doing its one thing well, and so on if needed.  Of course equivalents of
those old UNIX tools are still with us, mostly thanks to the GNU umbrella of
projects, but so many current tools have grown so many capabilities they no
longer can interact with with other tools in any sane way.  "pipes and
filters" seems destined to be constrained to the dustbin of tech history.

I'll shut up now...




More information about the Tutor mailing list