[scikit-learn] Retracting model from the 'blackbox' SVM

Fri May 4 05:12:40 EDT 2018

Dear developers of Scikit,

I am working on a scientific paper on a predictionmodel predicting
complications in major abdominal resections. I have been using scikit to
create that model and got good results (score of 0.94). This makes us want
to see what the model is like that is made by scikit.

As for now we got 100 input variables but logically these arent all as
usefull as the others and we want to reduce this number to about 20 and see
what the effects on the score are.

*My question*: Is there a way to get the underlying formula for the model
out of scikit instead of having it as a 'blackbox' in my svm function.
At this moment i am predicting a dichtomous variable with 100 variables,
(continuous, ordinal and binair).

My code:

import numpy as npfrom numpy import *import pandas as pdfrom sklearn
import tree, svm, linear_model, metrics, preprocessingimport
datetimefrom sklearn.model_selection import KFold, cross_val_score,
ShuffleSplit, GridSearchCVfrom time import gmtime, strftime
#database openen en voorbereiden
file = "/home/wouter/scikit/DB_SCIKIT.csv"
DB = pd.read_csv(file, sep=";", header=0, decimal= ',').as_matrix()
DBT = DBprint "Vorm van de DB: ", DB.shape
target = []for i in range(len(DB[:,-1])):
        target.append(DB[i,-1])
DB = delete(DB,s_[-1],1) #Laatste kolom verwijderenAantalOutcome =
target.count(1)print "Aantal outcome:", AantalOutcomeprint "Aantal
patienten:", len(target)

A = DB
b = target
print len(DBT)

svc=svm.SVC(kernel='linear', cache_size=500, probability=True)
indices = np.random.permutation(len(DBT))

rs = ShuffleSplit(n_splits=5, test_size=.15, random_state=None)
scores = cross_val_score(svc, A, b, cv=rs)
A = ("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))print A

X_train = DBT[indices[:-302]]
y_train = []for i in range(len(X_train[:,-1])):
        y_train.append(X_train[i,-1])
X_train = delete(X_train,s_[-1],1) #Laatste kolom verwijderen

X_test = DBT[indices[-302:]]
y_test = []for i in range(len(X_test[:,-1])):
        y_test.append(X_test[i,-1])
X_test = delete(X_test,s_[-1],1) #Laatste kolom verwijderen

model = svc.fit(X_train,y_train)print model

uitkomst = model.score(X_test, y_test)print uitkomst

voorspel = model.predict(X_test)print voorspel

And output:

Vorm van de DB:  (2011, 101)Aantal outcome: 128Aantal patienten:
20112011Accuracy: 0.94 (+/- 0.01)
SVC(C=1.0, cache_size=500, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=True, random_state=None, shrinking=True,
  tol=0.001, verbose=False)0.927152317881[0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Thanks in advance!

with kind regards,

Wouter Verduin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180504/989986e9/attachment-0001.html>