for -- else: what was the motivation?

avi.e.gross at gmail.com avi.e.gross at gmail.com
Sun Oct 9 22:38:28 EDT 2022


[This is an answer for Peter and can easily be skipped by those who know or
have no wish to.]

Strictly speaking Peter, the word "pipe" may not mean quite something in
Python but other concepts like chaining may be better.

The original use of the word I am used to was at the UNIX shell level where
an in-core data structure called a pipe was used to connect the output of
one process to the inputr of another, sometimes in long chains like:

 cat file1 file2 file3 | grep pattern | ... | lp

Various utilities were thus linked to dynamically create all kinds of
solutions. Note the programs ran in parallel and if a program wrote too much
into a pipe, it was frozen till the program reading from the pipe caught up
a bit and vice versa. So it often ran a lot faster than earlier programs
that ran sequentially and each one wrote output (unbuffered) into a file and
the next program would read from the file. And, of course, there were no
files to clean up and it skipped expensive I/O to and from slow hard disks.

Languages like R had extensions added in various ways  within a process that
were very different. The parts ran sequentially but instead of writing:

Name1 <- func1(args)
Name2 <- func2(Name1, args)
rm(Name1)
Name3 <- func3(Name2, args)
rm(Name2)
...

You could use an add-on library called dplyr (or others) to do something
like this where to some extent it is syntactic sugar:

Result <- Mydata %>% func1(remaining_args) %>% func2(remaining-args) %>%
func3(remaining-args)

A practical example would often be written like this using a dta.frame
called Mydata that has rows and columns:

Mydata <-
    Mydata %>%
    select(columns-to-keep) %>%
    rename(columns-to-change) %>%
    filter(conditions-on-which-rows-to-keep) %>%
    mutate(newcol=calculation, newcol=calculation, ...) %>%
    group_by(column, column, ...) %>%
    summarize(...) %>%
    arrange(col, desc(col), ...) %>%
    ggplot(...) + ...

There are many verbs that take your data one step at a time and keep
transforming it. The output of each step becomes the hidden first argument
of the next step. Not the same kind of pipeline. R recently added a native
pipe character so you might use the |> symbol now. It is not quite the same
but often faster.

So finally getting to how Python (and JavaScript and many others) do
something vaguely similar in the sense of chaining. 

If you have an object with some method you can call on it (or in some cases
a normal function call) that returns  another (or the same) object, then you
can write code like:

This.that.other


One obvious example that is trivial is an object that contains another
object which contains yet another object with attributes. Saying
a.b.c.attribute is not really a pipeline but simply multiple lines of code
collapsed into one fairly logical bundle. Similarly you can do
a.method().method().method() where each method is called on whatever object
the preceding method returned which is something "this" in whatever
language.

The pandas module is designed to make such pipelines doable as many methods
return the object being worked on.

 But consider an example in base Python  like a text object that has a
method to format as in :

"My name is {fname}, I'm {age}".format(fname = "John", age = 36)

The result is another string which can then be trimmed or split into parts
placed in a list and the list can be asked to do something like throw away
the 4th item or remove duplicates or be sorted and several more steps like
that using a period to apply some functionality to the current state of the
current object.

For those who use the python sklearn module, it has a vaguely different idea
of a pipeline as in specifying a set of transformations to be done so the
result of each step is passed as the input of the next step. You don't do
the steps yourself, as much as pass a set of functions to an object that
stores them until you ask it to perform an evaluation and then it processed
your data using those chained functions which can dynamically change. Lots
of machine learning algorithms use similar ideas such as neural networks
that propagate data/impulses along a chain of filters and so on.

For anyone still reading, the original point must be restated. My point was
some people like to program in small byte-size units as if still writing
their first BASIC program. When I use a pythonic idiom (or in R or other
languages whatever they consider a good way to use the language) they want
it dumbed down into simple single lines even in code like:

 a, b = b, a

They would rather use:

 temp = a
 a = b
 b = temp

You get the idea. So I have had some nice compact code re-arranged to be
much larger and sometimes harder to read and follow. So advice on a goal to
make the early blocks of code smaller than later ones may not work when
someone changes your code to their wishes!

End of digression. Again, not all meanings of pipeline are even close to
being the same.


-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com at python.org> On
Behalf Of Peter J. Holzer
Sent: Sunday, October 9, 2022 4:02 PM
To: python-list at python.org
Subject: Re: for -- else: what was the motivation?

On 2022-10-09 15:32:13 -0400, Avi Gross wrote:
> and of course no pipelines.

Since you've now used that term repeatedly: What is a pipeline in Python?

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"



More information about the Python-list mailing list