General Purpose Pipeline library?

duncan smith duncan at invalid.invalid
Mon Nov 20 18:22:08 EST 2017


On 20/11/17 15:48, Jason wrote:
> a pipeline can be described as a sequence of functions that are applied to an input with each subsequent function getting the output of the preceding function:
> 
> out = f6(f5(f4(f3(f2(f1(in))))))
> 
> However this isn't very readable and does not support conditionals.
> 
> Tensorflow has tensor-focused pipepines:
>     fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
>     fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
>     out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
> 
> I have some code which allows me to mimic this, but with an implied parameter.
> 
> def executePipeline(steps, collection_funcs = [map, filter, reduce]):
> 	results = None
> 	for step in steps:
> 		func = step[0]
> 		params = step[1]
> 		if func in collection_funcs:
> 			print func, params[0]
> 			results = func(functools.partial(params[0], *params[1:]), results)
> 		else:
> 			print func
> 			if results is None:
> 				results = func(*params)
> 			else:
> 				results = func(*(params+(results,)))
> 	return results
> 
> executePipeline( [
> 				(read_rows, (in_file,)),
> 				(map, (lower_row, field)),
> 				(stash_rows, ('stashed_file', )),
> 				(map, (lemmatize_row, field)),
> 				(vectorize_rows, (field, min_count,)),
> 				(evaluate_rows, (weights, None)),
> 				(recombine_rows, ('stashed_file', )),
> 				(write_rows, (out_file,))
> 			]
> )
> 
> Which gets me close, but I can't control where rows gets passed in. In the above code, it is always the last parameter.
> 
> I feel like I'm reinventing a wheel here.  I was wondering if there's already something that exists?
> 

Maybe Kamaelia?

http://www.kamaelia.org/Home.html

Duncan



More information about the Python-list mailing list