[scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

Joel Nothman joel.nothman at gmail.com
Wed May 6 09:43:41 EDT 2020


When it comes to trees, the API for handling categoricals is simpler than
the implementation. Traditionally, tree-based models' handling of
categorical variables differs from both ordinal and one-hot encoding, while
both of those will work reasonably well for many problems. We are working
on implementing categorical handling in trees (
https://github.com/scikit-learn/scikit-learn/issues/15550,
https://github.com/scikit-learn/scikit-learn/pull/12866)...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200506/53934042/attachment.html>


More information about the scikit-learn mailing list