[scikit-learn] caching transformers during hyper parameter optimization

Joel Nothman joel.nothman at gmail.com
Wed Aug 16 21:15:03 EDT 2017


Now this isn't the best example, because joblib.Memory isn't going to be
very fast at dumping a list of strings, but I hope you can get the idea
from https://gist.github.com/jnothman/019d594d197c98a3d6192fa0cb19c850


On 17 August 2017 at 02:53, Georg Heiler <georg.kf.heiler at gmail.com> wrote:

> Data cleaning @ enrichment
>
> Could you link an example for a mixing?
>
> Currently this is a bit if a mess with custom pickle persistence in a big
> for loop and custom transformers
>
> Thanks.
> Georg
> Joel Nothman <joel.nothman at gmail.com> schrieb am Mi. 16. Aug. 2017 um
> 13:51:
>
>> We certainly considered this over the many years that Pipeline caching
>> has been in the pipeline. Storing the fitted model means we can do both a
>> fit_transform and a transform on new data, and in many cases takes away the
>> pain point of CV over pipelines where downstream steps are varied.
>>
>> What transformer are you using where the transform is costly? Or is it
>> more a matter of you wanting to store the transformed data at each step?
>>
>> There are custom ways to do this sort of thing generically with a mixin
>> if you really want.
>>
>> On 16 August 2017 at 21:28, Georg Heiler <georg.kf.heiler at gmail.com>
>> wrote:
>>
>>> There is a new option in the pipeline: http://scikit-learn.
>>> org/stable/modules/pipeline.html#pipeline-cache
>>> How can I use this to also store the transformed data as I only want to
>>> compute the last step i.e. estimator during hyper parameter tuning and not
>>> the transform methods of the clean steps?
>>>
>>> Is there a possibility to apply this for crossvalidation? I would want
>>> to see all the folds precomputed and stored to disk in a folder.
>>>
>>> Regards,
>>> Georg
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170817/ea6224c8/attachment.html>


More information about the scikit-learn mailing list