[scikit-learn] Classifiers for dataset with categorical features

Raga Markely raga.markely at gmail.com
Fri Jul 21 14:59:40 EDT 2017


Sounds good, Sebastian.

Thank you!
Raga

On Fri, Jul 21, 2017 at 2:52 PM, Sebastian Raschka <se.raschka at gmail.com>
wrote:

> Just to throw some additional ideas in here. Based on a conversation with
> a colleague some time ago, I think learning classifier systems (
> https://en.wikipedia.org/wiki/Learning_classifier_system) are
> particularly useful when working with large, sparse binary vectors (like
> from a one-hot encoding). I am really not into LCS's, and only know the
> basics (read through the first chapters of the Intro to Learning Classifier
> Systems draft; the print version will be out later this year).
> Also, I saw an interesting poster on a Set Covering Machine algorithm
> once, which they benchmarked against SVMs, random forests and the like for
> categorical (genomics data). Looked promising.
>
> Best,
> Sebastian
>
>
> > On Jul 21, 2017, at 2:37 PM, Raga Markely <raga.markely at gmail.com>
> wrote:
> >
> > Thank you, Jacob. Appreciate it.
> >
> > Regarding 'perform better', I was referring to better accuracy,
> precision, recall, F1 score, etc.
> >
> > Thanks,
> > Raga
> >
> > On Fri, Jul 21, 2017 at 2:27 PM, Jacob Schreiber <
> jmschreiber91 at gmail.com> wrote:
> > Traditionally tree based methods are very good when it comes to
> categorical variables and can handle them appropriately. There is a current
> WIP PR to add this support to sklearn. I'm not exactly sure what you mean
> that "perform better" though. Estimators that ignore the categorical aspect
> of these variables and treat them as discrete will likely perform worse
> than those that treat them appropriately.
> >
> > On Fri, Jul 21, 2017 at 8:11 AM, Raga Markely <raga.markely at gmail.com>
> wrote:
> > Hello,
> >
> > I am wondering if there are some classifiers that perform better for
> datasets with categorical features (converted into sparse input matrix with
> pd.get_dummies())? The data for the categorical features are nominal (order
> doesn't matter, e.g. country, occupation, etc).
> >
> > If you could provide me some references (papers, books, website, etc),
> that would be great.
> >
> > Thank you very much!
> > Raga
> >
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170721/0d3bb651/attachment.html>


More information about the scikit-learn mailing list