[scikit-learn] SGD Early Stopping

Tue Mar 8 20:02:21 EST 2022

We have something we are not understanding.

clf2 = SGDClassifier(loss='log', penalty='l2',shuffle=True,
max_iter=10,tol=.00001,
                     early_stopping=True, validation_fraction=0.2,
                     n_iter_no_change=2, verbose=0, random_state=1)

clf2.fit(X_train,y_train)
clf2.n_iter_

The result of the last line is ALWAYS n_iter_no_chang+1. (in this case 3,
if we set n_iter+no_change=10, it ends at 11)  No matter how I try to slow
things down, it appears the early stopping kicks in at epoch 1.  We've
played with the learning rate, tolerance, etc... to try and make sure our
problem isn't being solved in 1 epoch (which does seem dubious).

I even ran this manually and scored the accuracy (along with enabling
warm_start=True, and max_iter=1)

for i in range(5):
    clf2.fit(X_train,y_train)
    p = clf2.predict(X_test)
    print(accuracy_score(p,y_test))

0.9748226138704509
0.987182421606775
0.9881742580300603
0.9879453727016099
0.991760128175784

So it seems there is some accuracy improvement there to be had, however
small--We're stumped as to what is going on and could use some wiser minds
to explain this behavior
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20220308/3a9c26ee/attachment.html>