[scikit-learn] [GridSearchCV] Reduction of elapsed time at the second interation
Pedro Cardoso
pedro.cardoso.code at gmail.com
Sun Mar 29 13:21:21 EDT 2020
Hello fellows,
i am knew at slkearn and I have a question about GridSearchCV:
I am running the following code at a jupyter notebook :
----------------------*code*-------------------------------
opt_models = dict()
for feature in [features1, features2, features3, features4]:
cmb = CMB(x_train, y_train, x_test, y_test, feature)
cmb.fit()
cmb.predict()
opt_models[str(feature)]=cmb.get_best_model()
-------------------------------------------------------
The CMB class is just a class that contains different classification models
(SVC, decision tree, etc...). When cmb.fit() is running, a gridSearchCV is
performed at the SVC model (which is within the cmb instance) in order to
tune the hyperparameters C, gamma, and kernel. The SCV model is implemented
using the sklearn.svm.SVC class. Here is the output of the first and second
iteration of the for loop:
---------------------*output*-------------------------------------
-> 1st iteration
Fitting 5 folds for each of 12 candidates, totalling 60 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 6.1s
[Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 6.1s
[Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 6.1s
[Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 6.2s
[Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 6.2s
[Parallel(n_jobs=-1)]: Done 6 tasks | elapsed: 6.2s
[Parallel(n_jobs=-1)]: Done 7 tasks | elapsed: 6.2s
[Parallel(n_jobs=-1)]: Done 8 tasks | elapsed: 6.2s
[Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 6.2s
[Parallel(n_jobs=-1)]: Done 10 tasks | elapsed: 6.2s
[Parallel(n_jobs=-1)]: Done 11 tasks | elapsed: 6.2s
[Parallel(n_jobs=-1)]: Done 12 tasks | elapsed: 6.3s
[Parallel(n_jobs=-1)]: Done 13 tasks | elapsed: 6.3s
[Parallel(n_jobs=-1)]: Done 14 tasks | elapsed: 6.3s
[Parallel(n_jobs=-1)]: Done 15 tasks | elapsed: 6.4s
[Parallel(n_jobs=-1)]: Done 16 tasks | elapsed: 6.4s
[Parallel(n_jobs=-1)]: Done 17 tasks | elapsed: 6.4s
[Parallel(n_jobs=-1)]: Done 18 tasks | elapsed: 6.4s
[Parallel(n_jobs=-1)]: Done 19 tasks | elapsed: 6.5s
[Parallel(n_jobs=-1)]: Done 20 tasks | elapsed: 6.5s
[Parallel(n_jobs=-1)]: Done 21 tasks | elapsed: 6.5s
[Parallel(n_jobs=-1)]: Done 22 tasks | elapsed: 6.6s
[Parallel(n_jobs=-1)]: Done 23 tasks | elapsed: 6.7s
[Parallel(n_jobs=-1)]: Done 24 tasks | elapsed: 6.7s
[Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 6.7s
[Parallel(n_jobs=-1)]: Done 26 tasks | elapsed: 6.8s
[Parallel(n_jobs=-1)]: Done 27 tasks | elapsed: 6.8s
[Parallel(n_jobs=-1)]: Done 28 tasks | elapsed: 6.9s
[Parallel(n_jobs=-1)]: Done 29 tasks | elapsed: 6.9s
[Parallel(n_jobs=-1)]: Done 30 tasks | elapsed: 6.9s
[Parallel(n_jobs=-1)]: Done 31 tasks | elapsed: 7.0s
[Parallel(n_jobs=-1)]: Done 32 tasks | elapsed: 7.0s
[Parallel(n_jobs=-1)]: Done 33 tasks | elapsed: 7.0s
[Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 7.0s
[Parallel(n_jobs=-1)]: Done 35 tasks | elapsed: 7.1s
[Parallel(n_jobs=-1)]: Done 36 tasks | elapsed: 7.1s
[Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 7.2s
[Parallel(n_jobs=-1)]: Done 38 tasks | elapsed: 7.2s
[Parallel(n_jobs=-1)]: Done 39 tasks | elapsed: 7.2s
[Parallel(n_jobs=-1)]: Done 40 tasks | elapsed: 7.2s
[Parallel(n_jobs=-1)]: Done 41 tasks | elapsed: 7.3s
[Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 7.3s
[Parallel(n_jobs=-1)]: Done 43 tasks | elapsed: 7.3s
[Parallel(n_jobs=-1)]: Done 44 tasks | elapsed: 7.4s
[Parallel(n_jobs=-1)]: Done 45 tasks | elapsed: 7.4s
[Parallel(n_jobs=-1)]: Done 46 tasks | elapsed: 7.5s
-> 2nd iteration
Fitting 5 folds for each of 12 candidates, totalling 60 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 0.0s
[Parallel(n_jobs=-1)]: Batch computation too fast (0.0260s.) Setting
batch_size=14.
[Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 0.0s
[Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 0.0s
[Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 0.0s
[Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 0.0s
[Parallel(n_jobs=-1)]: Done 60 out of 60 | elapsed: 0.7s finished
---------------------------------------------------------------------------------------------------------------------
As you can see, the first iteration gets a elapsed time much larger than
the 2nd iteration. Does it make sense? I am afraid that the model is doing
some kind of cache or shortcut from the 1st iteration, and consequently
could decrease the model training/performance? I already read the sklearn
documentation and I didn't saw any warning/note about this kind of
behaviour.
Thank you very much for your time :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200329/f7935463/attachment.html>
More information about the scikit-learn
mailing list