[scikit-learn] logistic regression results are not stable between solvers

Wed Oct 9 14:25:11 EDT 2019

Could you generate more samples, set penalty to none, reduce the tolerance and check the coefficients instead of predictions. This is sure to be sure that this is not only a numerical error. 

Sent from my phone - sorry to be brief and potential misspell.

	  Original Message  

From: benoit.presles at u-bourgogne.fr
Sent: 8 October 2019 20:27
To: scikit-learn at python.org
Reply to: scikit-learn at python.org
Subject: [scikit-learn] logistic regression results are not stable between solvers

Dear scikit-learn users,

I am using logistic regression to make some predictions. On my own data,
I do not get the same results between solvers. I managed to reproduce
this issue on synthetic data (see the code below).
All solvers seem to converge (n_iter_ < max_iter), so why do I get
different results?
If results between solvers are not stable, which one to choose?

Best regards,
Ben

------------------------------------------

Here is the code I used to generate synthetic data:

from sklearn.datasets import make_classification
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
#
RANDOM_SEED = 2
#
X_sim, y_sim = make_classification(n_samples=200,
                           n_features=45,
                           n_informative=10,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=2,
                           n_clusters_per_class=1,
                           random_state=RANDOM_SEED,
                           shuffle=False)
#
sss = StratifiedShuffleSplit(n_splits=10, test_size=0.2,
random_state=RANDOM_SEED)
for train_index_split, test_index_split in sss.split(X_sim, y_sim):
    X_split_train, X_split_test = X_sim[train_index_split],
X_sim[test_index_split]
    y_split_train, y_split_test = y_sim[train_index_split],
y_sim[test_index_split]
    ss = StandardScaler()
    X_split_train = ss.fit_transform(X_split_train)
    X_split_test = ss.transform(X_split_test)
    #
    classifier_lbfgs = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9,
                                    solver='lbfgs')
    classifier_lbfgs.fit(X_split_train, y_split_train)
    print('classifier lbfgs iter:',  classifier_lbfgs.n_iter_)
    classifier_saga = LogisticRegression(fit_intercept=True,
max_iter=20000000, verbose=1, random_state=RANDOM_SEED, C=1e9,
                                    solver='saga')
    classifier_saga.fit(X_split_train, y_split_train)
    print('classifier saga iter:', classifier_saga.n_iter_)
    #
    y_pred_lbfgs = classifier_lbfgs.predict(X_split_test)
    y_pred_saga  = classifier_saga.predict(X_split_test)
    #
    if (y_pred_lbfgs==y_pred_saga).all() == False:
        print('lbfgs does not give the same results as saga :-( !')
        exit()

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn