[scikit-learn] Label encoding for classifiers and soft targets

Javier López Peña jlopez at ende.cc
Sat Mar 11 08:04:57 EST 2017


Hi there!

I have been recently experimenting with model regularization through the use of soft targets,
and I’d like to be able to play with that from sklearn.

The main idea is as follows: imagine I want to fit a (probabilisitic) classifier with three possible 
targets, 0, 1, 2

If I pass my training set (X, y) to a sklearn classifier, the target vector y gets encoded so that
each target becomes an array, [1, 0, 0], [0, 1, 0], or [0, 0, 1]

What I would like to do is to be able to pass the targets directly in the encoded form, and avoid
any further encoding. This allows for instance to pass targets as [0.9, 0.5, 0.5] if I want to prevent
my classifier from becoming too opinionated on its predicted probabilities.

Ideally I would like to do something like this:
```
clf = SomeClassifier(*parameters, encode_targets=False)
```

and then call
```
elf.fit(X, encoded_y) 
```

Would it be simple to modify sklearn code to do this, or would it require a lot of tinkering 
such as modifying every single classifier under the sun? 

Cheers,
J


More information about the scikit-learn mailing list