[scikit-learn] Missing data and decision trees

Jeff jeffrey.m.allard at gmail.com
Thu Oct 13 14:20:40 EDT 2016


I ran into this several times as well with scikit-learn implementation 
of GBM. Look at xgboost if you have not already (is there someone out 
there that hasn't ? :)- it deals with missing values in the predictor 
space in a very eloquent manner.

http://xgboost.readthedocs.io/en/latest/python/python_intro.html

https://arxiv.org/abs/1603.02754


Jeff



On 10/13/2016 2:14 PM, Stuart Reynolds wrote:
> I'm looking for a decision tree and RF implementation that supports 
> missing data (without imputation) -- ideally in Python, Java/Scala or 
> C++.
>
> It seems that scikit's decision tree algorithm doesn't allow this -- 
> which is disappointing because its one of the few methods that should 
> be able to sensibly handle problems with high amounts of missingness.
>
> Are there plans to allow missing data in scikit's decision trees?
>
> Also, is there any particular reason why missing values weren't 
> supported originally (e.g. integrates poorly with other features)
>
> Regards
> - Stuart
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161013/d14f49e0/attachment.html>


More information about the scikit-learn mailing list