Stream programming

Sat Mar 24 07:05:28 EDT 2012

On 3/24/2012 4:23, Steven D'Aprano wrote:
> On Fri, 23 Mar 2012 17:00:23 +0100, Kiuhnm wrote:
>
>> I've been writing a little library for handling streams as an excuse for
>> doing a little OOP with Python.
>>
>> I don't share some of the views on readability expressed on this ng.
>> Indeed, I believe that a piece of code may very well start as complete
>> gibberish and become a pleasure to read after some additional
>> information is provided.
> [...]
>> numbers - push - avrg - 'med' - pop - filter(lt('med'), ge('med'))\
>>       - ['same', 'same'] - streams(cat) - 'same'
>>
>> Ok, we're at the "complete gibberish" phase.
>>
>> Time to give you the "additional information".
>
> There are multiple problems with your DSL. Having read your explanation,
> and subsequent posts, I think I understand the data model, but the syntax
> itself is not very good and far from readable. It is just too hard to
> reason about the code.
>
> Your syntax conflicts with established, far more common, use of the same
> syntax: you use - to mean "call a function" and | to join two or more
> streams into a flow.
>
> You also use () for calling functions, and the difference between - and
> () isn't clear. So a mystery there -- your DSL seems to have different
> function syntax, depending on... what?
>
> The semantics are unclear even after your examples. To understand your
> syntax, you give examples, but to understand the examples, the reader
> needs to understand the syntax. That suggests that the semantics are
> unclear even in your own mind, or at least too difficult to explain in
> simple examples.
>
> Take this example:
>
>> Flows can be saved (push) and restored (pop) :
>>     [1,2,3,4] - push - by(2) - 'double' - pop | val('double')
>>         <=>  [1,2,3,4] | [2,4,6,8]
>
> What the hell does that mean? The reader initially doesn't know what
> *any* of push, by(2), pop or val('double') means. All they see is an
> obfuscated series of calls that starts with a stream as input, makes a
> copy of it, and doubles the entries in the copy: you make FIVE function
> calls to perform TWO conceptual operations. So the reader can't even map
> a function call to a result.
>
> With careful thought and further explanations from you, the reader (me)
> eventually gets a mental model here. Your DSL has a single input which is
> pipelined through a series of function calls by the - operator, plus a
> separate stack. (I initially thought that, like Forth, your DSL was stack
> based. But it isn't, is it?)
>
> It seems to me that the - operator is only needed as syntactic sugar to
> avoid using reverse Polish notation and an implicit stack. Instead of the
> Forth-like:
>
> [1,2,3,4] dup 2 *
>
> your DSL has an explicit stack, and an explicit - operator to call a
> function. Presumably "[1,2] push" would be a syntax error.
>
> I think this is a good example of an inferior syntax. Contrast your:
>
> [1,2,3,4] - push - by(2) - 'double' - pop | val('double')
>
> with the equivalent RPL:
>
> [1,2,3,4] dup 2 *

I was just explaining how push and pop work.
I also said that
   [1,2,3,4] - [id,by(2)]
would be the recommended way to do it.

> Now *that* is a pleasure to read, once you wrap your head around reverse
> Polish notation and the concept of a stack. Which you need in your DSL
> anyway, to understand push and pop.

I don't see why. Push and pop are not needed. They're just handful 
mainly to modify a flow, collect a result, and go back to how the flow 
was before the push.
It has nothing to do with RPN (which RPL is based on).

> You say that this is an "easier way to get the same result":
>
> [1,2,3,4] - [id, by(2)]
>
> but it isn't, is it? The more complex example above ends up with two
> streams joined in a single flow:
>
> [1,2,3,4]|[2,4,6,8]
>
> whereas the shorter version using the magic "id" gives you a single
> stream containing nested streams:
>
> [[1,2,3,4], [2,4,6,8]]

Says who?

Here are the rules again:
A flow can be transformed:
   [1,2] - f <=> [f(1),f(2)]
   ([1,2] | [3,4]) - f <=> [f(1,3),f(2,4)]
   ([1,2] | [3,4]) - [f] <=> [f(1),f(2)] | [f(3),f(4)]
   ([1,2] | [3,4]) - [f,g] <=> [f(1),f(2)] | [g(3),g(4)]
   [1,2] - [f,g] <=> [f(1),f(2)] | [g(1),g(2)]

Read the last line.
What's very interesting, is that [f,g] is an iterable as well, so your 
functions can be generated as needed.

> So, how could you make this more readable?
>
> * Don't fight the reader's expectations. If they've programmed in Unix
> shells, they expect | as the pipelining operator. If they haven't, they
> probably will find>>  easy to read as a dataflow operator. Either way,
> they're probably used to seeing a|b as meaning "or" (as in "this stream,
> or this stream") rather than the way you seem to be using it ("this
> stream, and this stream").
>
> Here's my first attempt at improved syntax that doesn't fight the user:
>
> [1,2,3,4]>>  push>>  by(2)>>  'double'>>  pop&  val('double')

There are problems with your syntax.
Mine:
[...]+[...] - f + [...] - g - h + [...] - i + [...]
Yours:
((([...]+[...] >> f) + [...] >> g >> h) + [...] >> i) + [...]
I first tried to use '<<' and '>>' but '+' and '-' are much better.

> "push" and "pop" are poor choices of words. Push does not actually push
> its input onto the stack, which would leave the input stream empty. It
> makes a copy. You explain what they do:

Why should push move and not copy? In asm and openGL they copy, for 
instance.

> "Flows can be saved (push) and restored (pop)"
>
> so why not just use SAVE and RESTORE as your functions? Or if they're too
> verbose, STO and RCL, or my preference, store and recall.

Because that's not what they do.
push and pop actually push and pop, i.e. they can be nested and work as 
expected.

> [1,2,3,4]>>  store>>  by(2)>>  'double'>>  recall&  val('double')
>
> I'm still not happy with&  for the join operator. I think that the use of
> + for concatenate and&  for join is just one of those arbitrary choices
> that the user will have to learn. Although I'm tempted to try using a
> colon instead.
>
> [1,2,3]:[4,5,6]
>
> would be a flow with two streams.

I can't see a way to overload ':' in Python. There are also technical 
limitations.

> I don't like the syntax for defining and using names. Here's a random
> thought:
>
> [1,2,3,4]>>  store>>  by(2)>>  @double>>  recall&  double
>
> Use @name to store to a name, and the name alone to retrieve from it. But
> I haven't given this too much thought, so it too might suck.

The problem, again, is Python limitation in defining DSLs.
At this point, one would have to interpret command-strings. I was trying 
to avoid an interpreter on an interpreter.

> Some other problems with your DSL:
>
>> A flow can be transformed:
>>     [1,2] - f<=>  [f(1),f(2)]
>
> but that's not consistently true. For instance:
>
> [1,2] - push<=/=>   [push(1), push(2)]

push is a special function (a keyword). It's clear what it does. It's 
just an exception to the general rule.

> So the reader needs to know all the semantics of the particular function
> f before being able to reason about the flow.

No, he only has to know special functions. Those are practically keywords.

>> Some functions are special and almost any function can be made special:
>>     [1,2,3,4,5] - filter(isprime)<=>  [2,3,5]
>>     [[],(1,2),[3,4,5]] - flatten<=>  [1,2,3,4,5]
>
> You say that as if it were a good thing.

It is, because it's never implicit. For instance, isprime is a filter. 
flatten is a special builtin function (a keyword).

Kiuhnm