General Purpose Pipeline library?

Friedrich Rentsch anthra.norell at bluewin.ch
Wed Nov 22 05:38:41 EST 2017



On 11/21/2017 03:26 PM, Jason wrote:
> On Monday, November 20, 2017 at 10:49:01 AM UTC-5, Jason wrote:
>> a pipeline can be described as a sequence of functions that are applied to an input with each subsequent function getting the output of the preceding function:
>>
>> out = f6(f5(f4(f3(f2(f1(in))))))
>>
>> However this isn't very readable and does not support conditionals.
>>
>> Tensorflow has tensor-focused pipepines:
>>      fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
>>      fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
>>      out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
>>
>> I have some code which allows me to mimic this, but with an implied parameter.
>>
>> def executePipeline(steps, collection_funcs = [map, filter, reduce]):
>> 	results = None
>> 	for step in steps:
>> 		func = step[0]
>> 		params = step[1]
>> 		if func in collection_funcs:
>> 			print func, params[0]
>> 			results = func(functools.partial(params[0], *params[1:]), results)
>> 		else:
>> 			print func
>> 			if results is None:
>> 				results = func(*params)
>> 			else:
>> 				results = func(*(params+(results,)))
>> 	return results
>>
>> executePipeline( [
>> 				(read_rows, (in_file,)),
>> 				(map, (lower_row, field)),
>> 				(stash_rows, ('stashed_file', )),
>> 				(map, (lemmatize_row, field)),
>> 				(vectorize_rows, (field, min_count,)),
>> 				(evaluate_rows, (weights, None)),
>> 				(recombine_rows, ('stashed_file', )),
>> 				(write_rows, (out_file,))
>> 			]
>> )
>>
>> Which gets me close, but I can't control where rows gets passed in. In the above code, it is always the last parameter.
>>
>> I feel like I'm reinventing a wheel here.  I was wondering if there's already something that exists?
> Why do I want this? Because I'm tired of writing code that is locked away in a bespoke function. I'd  have an army of functions all slightly different in functionality. I require flexibility in defining pipelines, and I don't want a custom pipeline to require any low-level coding. I just want to feed a sequence of functions to a script and have it process it. A middle ground between the shell | operator and bespoke python code. Sure, I could write many binaries bound by shell, but there are some things done far easier in python because of its extensive libraries and it can exist throughout the execution of the pipeline whereas any temporary persistence  has to be though environment variables or files.
>
> Well after examining your feedback, it looks like Grapevine has 99% of the concepts that I wanted to invent, even if the | operator seems a bit clunky. I personally prefer the affluent interface convention. But this should work.
>
> Kamaelia could also work, but it seems a little bit more grandiose.
>
>
> Thanks everyone who chimed in!




More information about the Python-list mailing list