[scikit-learn] Categorical handling

Georg Heiler georg.kf.heiler at gmail.com
Thu Aug 17 07:50:33 EDT 2017


Hi,

how can I properly handle categorical values in scikit-learn?
https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934


goals

   - scikit-learn syle fit/transform methods to encode labels of
   categorical features of X
   - should handle unseen labels
   - should be faster than running a label encoder manually for each fold
   and manually checking if the label already was seen in the training data
   i.e. what I currently do (
   https://stackoverflow.com/questions/45727934/pandas-categories-new-levels?noredirect=1#comment78424496_45727934
which
   links to https://gist.github.com/geoHeil/5caff5236b4850d673b2c9b0799dc2ce
   )
   - only some columns are categorical, and only these should be converted


Regards,
Georg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170817/0a2d2c9b/attachment.html>


More information about the scikit-learn mailing list