[scikit-learn] Categorical handling
Andreas Mueller
t3kcit at gmail.com
Thu Aug 17 11:11:49 EDT 2017
Hi Georg.
Unfortunately this is not entirely trivial right now, but will be fixed by
https://github.com/scikit-learn/scikit-learn/pull/9151
and
https://github.com/scikit-learn/scikit-learn/pull/9012
which will be in the next release (0.20).
LabelBinarizer is probably the best work-around for now, and selecting
columns can be done (awkwardly)
like in this example:
http://scikit-learn.org/dev/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py
Best,
Andy
On 08/17/2017 07:50 AM, Georg Heiler wrote:
> Hi,
>
> how can I properly handle categorical values in scikit-learn?
> https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934
>
>
> goals
>
> * scikit-learn syle fit/transform methods to encode labels of
> categorical features of X
> * should handle unseen labels
> * should be faster than running a label encoder manually for each
> fold and manually checking if the label already was seen in the
> training data i.e. what I currently do
> (https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934 which
> links to
> https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce)
> * only some columns are categorical, and only these should be converted
>
>
> Regards,
> Georg
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170817/9b5f4f24/attachment-0001.html>
More information about the scikit-learn
mailing list