[scikit-learn] Model checksums

Stuart Reynolds stuart at stuartreynolds.net
Thu Dec 15 18:00:12 EST 2016


I don't mean that scikit-learn's modeling is non-deterministic -- I mean
the pickle library. Same input different serialized bytes output. It was my
recollection that dictionaries were inconsistently ordered when serialized,
or some the object ID was included in the serialization -- anyhow I don't
seem to be able reproduce it now I've fixed a bug and am actually providing
identical input to serialize.

Thanks for the joblib serialization link. The memory serializer is buried
in the docs (is not mentioned in the docs on persistence)

On Tue, Dec 13, 2016 at 12:10 PM, Gael Varoquaux <
gael.varoquaux at normalesup.org> wrote:

> What do you mean non deterministic? If you set the random_state of
> models,  we try to make them deterministic. Most often, any residual
> variability is numerical noise that reveals statistical error bars.
>
> G
>
> Sent from my phone. Please forgive brevity and mis spelling
> On Dec 13, 2016, at 19:29, Stuart Reynolds <stuart at stuartreynolds.net>
> wrote:
>
>> I'd like to cache some functions to avoid rebuilding models like so:
>>
>>     @cached
>>     def train(model, dataparams): ...
>>
>>
>> model is an (untrained) scikit-learn object and dataparams is a dict.
>> The @cached annotation forms a SHA checksum out of the parameters of the
>> function it annotates and returns the previously calculated function result
>> if the parameters match.
>>
>> The tricky part here is reliably generating a checksum from the
>> parameters. Scikit uses Python's pickle (http://scikit-learn.org/
>> stable/modules/model_persistence.html) but the pickle library is
>> non-deterministic (same inputs to pickle.dumps yields differing output! --
>> *I know*).
>>
>> So... any suggestions on how to generate checksums from models in python?
>>
>> Thanks.
>> - Stuart
>>
>>
>> ------------------------------
>>
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161215/6cd36c3e/attachment.html>


More information about the scikit-learn mailing list