for -- else: what was the motivation?

avi.e.gross at gmail.com avi.e.gross at gmail.com
Tue Oct 11 13:58:24 EDT 2022


Anton,

Your example overlaps with the use of generators in Python to do variants of
the same pipeline ideas. 

But is that native python or some extension where "|" has been modified to
mean something other than a form of OR in some places? What module do you
need to load to make that happen?

I think there is a long history in Computing where it may be easier to write
languages that do things in what some call Reverse Polish Notation and you
have languages of the LISP variety, as one example, that suggest code like
(MULT (ADD 1 2) (SUB 5 3)) which gets really annoying for longer
calculations like say the quadratic formula. People (and many programmers
are still people) often prefer some form of infix notation even if the
machine gets re-arranged code that does something like the above. 

Pipes in UNIX had real asynchronous but coordinated meaning and often ran
quite a bit faster as some parts could run while others were doing I/O or
otherwise waiting. BUT it often led to fairly quick and sloppy solutions
that probably could have been done faster within one application without
lots of overhead.

The kind of pipelines we have been talking about, mainly in Python albeit I
mentioned the versions in R, are largely syntactic sugar where often there
is really a single thread of execution which can range from almost
meaningless (as in each part runs to completion and generates lots of data
that is then used by the next stage and eventually intermediates are garbage
collected) to some with a sort of zig-zag approach where smaller chunks of
data are made as various functions are called and yield a partial result and
go to sleep as the next function that called it does something similar.
There may be little or no speedup of the code or even a slowing down.

What would be needed for a better emulation of the UNIX (and later LINUX and
etc.) pipeline is closer to having asynchronous processes running on
multiple cores and yet communicating efficiently. And it need not be linear.
Consider a merge sort as an example where each level subdivides the data and
calls multiple others to recursively work on it and the parent waits for
them all to complete and then merges the results. Some algorithms along
those lines can be run across machines entirely or using machines all over
the internet. You can imagine a chess program that ships all reasonable
possible next moves to others and gets back some analysis of how well each
move is rated based on multiple levels of look-ahead from each child that
farmed out the possible moves to the next level. This kind of approach can
be treelike or a generalized graph and also in higher dimensions. 

An example of what I mean by dimensions is tools that work on XML or just
HTML, like xpath or jquery and have various "dimensions" such as following
in  a direction that chooses based on ID versus a direction based on CLASS
versus a direction based on ancestry, or siblings, or ordinality and many
other such attributes. But enough of that.

Back to your point, the map/filter/reduce kinds of functionality have long
been used in various kinds of functional programming in many languages and
are a way to do pipelining, albeit usually in a more RPN way by nesting
function calls within function calls in a harder-to-read way. If the syntax
change allows a pipe symbol so it can be written infix, that helps
programmers who prefer to think more linearly. 

But since the original topic here was loosely about loops and the silly
choice (there was no great choice) of re-using the ELSE keyword, I note how
commonly we often use loops as in:

for first in iterator:
   for second in iterator:
        ...

The second (and deeper) loop(s) sort of run faster than the outer ones. But
compare that to how you might write a comprehension like:

[x*y*z for x in range(5) for y in range(5) for z in range(5)]

The above acts sort of like a pipeline as in the first "for" sends 0 to 4 to
the second which passes that along plus another 0..4 to the next one which
passes that plus it's own contribution to the main body which multiplies
them and adds them to a growing list. Of course the problem in the example
is that the main body is in front even as it is I a sense done last, and the
entire construct does the collecting of the results invisibly into a list.
For some people I know, the nested loop version makes more sense and for
some people the comprehension method seems more inline, or at least
condensed. 

My point is perhaps both subtle and blatant. There are tons of IDEAS out
there that superficially have some ways they can be perceived as similar but
ultimately can be seen to differ in many ways and are not the same thing and
often are not interchangeable. A pipeline construct with some kinds of
generators can generate an "infinite" amount of data in theory, but in
practice only an indefinite amount, such as the first N primes with N not
being a constant. Another pipeline that calculates the first million primes
and feeds them in a pipeline that perhaps quits after seeing the first few
dozen, may look similar but wastes lots of resources in computing too much
and storing too much. And it fails if the program turns out to need more
than a million. 

The map and filter and reduce methods often call a function repeatedly on
data and combine the results. But if you use a generator, calling it
repeatedly is more like continuing the existing instance that is just
hibernating and can have very different results. Similarly, languages that
are vectorized may do better to not be called through these functions as
they can perform the many operations on their own on known data of a known
length. But they can often be easy to pipeline in various ways with other
such vectorized functions somewhat more directly.

This forum often talks about what methods seem more "pythonic" than others.
Arguably generators are now part of a pythonic way for many. Not sure if
various ideas of "pipelines" are equally pythonic and some may really be
rather foreign to many programmers, at least until they catch on.

Avi


-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com at python.org> On
Behalf Of Antoon Pardon
Sent: Tuesday, October 11, 2022 10:30 AM
To: python-list at python.org
Subject: Re: for -- else: what was the motivation?



Op 10/10/2022 om 04:38 schreef avi.e.gross at gmail.com:
> [This is an answer for Peter and can easily be skipped by those who 
> know or have no wish to.]
>
> Strictly speaking Peter, the word "pipe" may not mean quite something 
> in Python but other concepts like chaining may be better.
>
> The original use of the word I am used to was at the UNIX shell level 
> where an in-core data structure called a pipe was used to connect the 
> output of one process to the inputr of another, sometimes in long chains
like:
>
>   cat file1 file2 file3 | grep pattern | ... | lp

Something like that can be done in python with the same kind of syntax:

https://code.activestate.com/recipes/580625-collection-pipeline-in-python/

I have my own python3 module with stuff like that and I find it very
usefull.

--
Antoon Pardon
--
https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list