[scikit-learn] One-hot encoding

Joel Nothman joel.nothman at gmail.com
Mon Feb 5 00:02:38 EST 2018


If each input column is encoded as a value from 0 to the (number of
possible values for that column - 1) then n_values for that column should
be the highest value + 1, which is also the number of levels per column.
Does that make sense?

Actually, I've realised there's a somewhat slow and unnecessary bit of code
in the one-hot encoder: where the COO matrix is converted to CSR. I suspect
this was done because most of our ML algorithms perform better on CSR, or
else to maintain backwards compatibility with an earlier implementation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180205/8b696f03/attachment.html>


More information about the scikit-learn mailing list