[scikit-learn] Trying to get learning curves with custom scorer and leave one group out

Sat Dec 3 13:13:50 EST 2016

That indeed looks odd.
Can you reproduce with synthetic data?

On 12/02/2016 10:40 PM, Matteo Niccoli wrote:
> My apologies, there was a typo in the code below, second example, should
> read:
>
> train_scores1, test_scores1 = validation_curve(SVC_classifier_LOWO_VC1, X,
> y, "C", parm_range1, cv =logo.split(X, y, groups=groups), scoring =
> 'accuracy')
>
> Everything else is correct.
>
>
> On Fri, December 2, 2016 10:28 pm, Matteo Niccoli wrote:
>> HI all,
>>
>>
>> I want to plot learning curves on a trained SVM classifier, using a
>> custom scorer, and using Leave One Group Out as the method of
>> crossvalidation. I thought I had it figured out, but two different scorers
>> - 'f1_micro' and
>> 'accuracy' - will yield identical values. I am confused, is that supposed
>> to be the case?
>>
>> Here's my code (unfortunately I cannot share the data as it is not open):
>>
>>
>> from sklearn import svm SVC_classifier_LOWO_VC0 = svm.SVC(cache_size=800,
>> class_weight=None, coef0=0.0, decision_function_shape=None, degree=3,
>> gamma=0.01, kernel='rbf', max_iter=-1, probability=False, random_state=1,
>> shrinking=True, tol=0.001, verbose=False) training_data =
>> pd.read_csv('training_data.csv') scaler =
>> preprocessing.StandardScaler().fit(X) X = scaler.transform(X)
>> y = training_data['Targets'].values groups = training_data["Groups"].values
>>   Fscorer = make_scorer(f1_score, average = 'micro')
>> logo = LeaveOneGroupOut() parm_range0 = np.logspace(-2, 6, 9)
> train_scores0,
>> test_scores0 = validation_curve(SVC_classifier_LOWO_VC0, X, y, "C",
>> parm_range0, cv =logo.split(X, y, groups=groups), scoring = Fscorer)
>>
>>
>> Now, from:
>> train_scores_mean0 = np.mean(train_scores0, axis=1) train_scores_std0 =
>> np.std(train_scores0, axis=1) test_scores_mean0 = np.mean(test_scores0,
>> axis=1) test_scores_std0 = np.std(test_scores0, axis=1) print
>> test_scores_mean0 print np.amax(test_scores_mean0) print  np.logspace(-2,
>> 6, 9)[test_scores_mean0.argmax(axis=0)]
>>
>>
>> I get:
>> [ 0.20257407  0.35551122  0.40791047  0.49887676  0.5021742   0.50030438
>> 0.49426622  0.48066419  0.4868987 ]
>> 0.502174200206
>> 100.0
>>
>>
>> If I create a new classifier, but with the same parameters, and run
>> everything exactly as before, except for the scoring, e.g.:
>>
>> parm_range1 = np.logspace(-2, 6, 9) train_scores1, test_scores1 =
>> validation_curve(SVC_classifier_LOWO_VC1, X, y, "C", parm_range1, cv
>> =logo.split(X, y, groups=wells), scoring =
>> 'accuracy')
>> train_scores_mean1 = np.mean(train_scores1, axis=1) train_scores_std1=
>> np.std(train_scores1, axis=1) test_scores_mean1 = np.mean(test_scores1,
>> axis=1) test_scores_std1 = np.std(test_scores1, axis=1) print
>> test_scores_mean1 print np.amax(test_scores_mean1) print  np.logspace(-2,
>> 6, 9)[test_scores_mean1.argmax(axis=0)]
>>
>>
>> I get exactly the same answer:
>> [ 0.20257407  0.35551122  0.40791047  0.49887676  0.5021742   0.50030438
>> 0.49426622  0.48066419  0.4868987 ]
>> 0.502174200206
>> 100.0
>>
>>
>> How is that possible, am I doing something wrong, or missing something?
>>
>>
>> Thanks
>>
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn