[scikit-learn] scikit-learn Digest, Vol 30, Issue 25

Tue Oct 2 11:56:06 EDT 2018

Thank you for your feedback!

On 10/01/2018 09:11 PM, Jason Sanchez wrote:
> The current roadmap is amazing. One feature that would be exciting is 
> better support for multilayer stacking with caching and the ability to 
> add models to already trained layers.
>
> I saw this history: https://github.com/scikit-learn/scikit-learn/pull/8960
>
I think we still want to include something like this. I guess maybe it 
wasn't thought of as major enough to make the roadmap.
The roadmap mostly has API changes and things that impact more than one 
estimator. This is "just" adding an estimator for the most part.

> This library is very close:
> * API is somewhat awkward, but otherwise good. Does not cache 
> intermediate steps. https://wolpert.readthedocs.io/en/latest/index.html
If we reuse pipelines, we might get this "for free" to some degree.
>
>
> As another data point, I attached a simple implementation I put 
> together to illustrate what I think are core needs of this feature. 
> Feel free to browse the code. Here is the short list:
> * Infinite layers (or at least 3 ;) )
Pretty sure that'll happen
> * Choice of CV or OOB for each model
This is less likely to happen in an initial version, I think. These two 
things have traditionally been very separate. We could potentially
add to the roadmap to make this easier? (actually I just did)
> * Ability to add a new model to a layer after the stacked ensemble has 
> been trained and refit the pipeline such that only models that must be 
> retrained are retrained (i.e. train the added model and retrain all 
> models in higher layers)
This is the "freezing estimators" that's on the roadmap.
> * All standard scikit-learn pipeline goodness (introspection, grid 
> search, serializability, etc)
>
That's a given for anything in sklearn ;)