[scikit-learn] Categorical handling

Andreas Mueller t3kcit at gmail.com
Thu Aug 17 11:11:49 EDT 2017


Hi Georg.
Unfortunately this is not entirely trivial right now, but will be fixed by
https://github.com/scikit-learn/scikit-learn/pull/9151
and
https://github.com/scikit-learn/scikit-learn/pull/9012
which will be in the next release (0.20).

LabelBinarizer is probably the best work-around for now, and selecting 
columns can be done (awkwardly)
like in this example: 
http://scikit-learn.org/dev/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py

Best,
Andy

On 08/17/2017 07:50 AM, Georg Heiler wrote:
> Hi,
>
> how can I properly handle categorical values in scikit-learn?
> https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934 
>
>
> goals
>
>   * scikit-learn syle fit/transform methods to encode labels of
>     categorical features of X
>   * should handle unseen labels
>   * should be faster than running a label encoder manually for each
>     fold and manually checking if the label already was seen in the
>     training data i.e. what I currently do
>     (https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934 which
>     links to
>     https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce)
>   * only some columns are categorical, and only these should be converted
>
>
> Regards,
> Georg
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170817/9b5f4f24/attachment-0001.html>


More information about the scikit-learn mailing list