[scikit-learn] Classifiers for dataset with categorical features

Jacob Schreiber jmschreiber91 at gmail.com
Fri Jul 21 14:27:50 EDT 2017

Traditionally tree based methods are very good when it comes to categorical
variables and can handle them appropriately. There is a current WIP PR to
add this support to sklearn. I'm not exactly sure what you mean that
"perform better" though. Estimators that ignore the categorical aspect of
these variables and treat them as discrete will likely perform worse than
those that treat them appropriately.

On Fri, Jul 21, 2017 at 8:11 AM, Raga Markely <raga.markely at gmail.com>

> Hello,
> I am wondering if there are some classifiers that perform better for
> datasets with categorical features (converted into sparse input matrix with
> pd.get_dummies())? The data for the categorical features are nominal (order
> doesn't matter, e.g. country, occupation, etc).
> If you could provide me some references (papers, books, website, etc),
> that would be great.
> Thank you very much!
> Raga
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170721/c75c760b/attachment.html>

More information about the scikit-learn mailing list