[scikit-learn] How to not recalculate transformer in a Pipeline?

Andreas Mueller t3kcit at gmail.com
Mon Nov 28 11:39:59 EST 2016


Hey Anton.
Yes, that would be great to have.
There is no solution implemented in scikit-learn right now, but there 
are at least two ways that I know of.
This (ancient and probably now defunct) pr:
https://github.com/scikit-learn/scikit-learn/pull/3951

And using dask:
http://matthewrocklin.com/blog/work/2016/07/12/dask-learn-part-1

Andy


On 11/28/2016 10:24 AM, Anton Suchaneck wrote:
> Hello!
>
> I use a 2-step Pipeline with an expensive transformer and a 
> classification afterwards. On this I do GridSearchCV of the 
> classifcation parameters.
>
> Now, theoretically GridSearchCV could know that I'm not touching any 
> parameters of the transformer and avoid re-doing work by keeping the 
> transformed X, right?!
> Currently, GridSearchCV will do a clean re-run of all Pipeline steps?
>
> Can you recommend the easiest way for me to use GridSearchCV+Pipeline 
> while avoiding recomputation of all transformer steps whose parameters 
> are not in the GridSearch? I realize this may be tricky, but any 
> pointers to realize this most conveniently and compatible with sklearn 
> would be highly appreciated!
>
> (The scoring has to be done on the initial data, so I cannot just 
> manually transform beforehand.)
>
> Regards,
> Anton
>
> PS: If that all makes sense, is that a useful feature to include in 
> sklearn?
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161128/5a22cf09/attachment.html>


More information about the scikit-learn mailing list