[scikit-learn] How to not recalculate transformer in a Pipeline?
Gael Varoquaux
gael.varoquaux at normalesup.org
Mon Nov 28 12:15:26 EST 2016
> Or would you cache the return of "fit" as well as "transform"?
Caching fit rather than transform. Fit is usually the costly step.
> Caching "fit" with joblib seems non-trivial.
Why? Caching a function that takes the estimator and X and y should do
it. The transformer would clone the estimator on fit, to avoid
side-effects that would trigger recomputes.
It's a pattern that I use often, I've just never coded a good transformer
for it.
On my usecases, it works very well, provided that everything is nicely
seeded. Also, the persistence across sessions is a real time saver.
More information about the scikit-learn
mailing list