[scikit-learn] Query about use of standard deviation on tree feature_importances_ in demo plot_forest_importances.html

Ian Ozsvald ian at ianozsvald.com
Sat Jun 24 05:17:51 EDT 2017


Good. I'd suggested a box plot or use of IQR (on a bar chart) on the
yellowbrick list. I was assuming that if distribution of feature
importances contained many '0's might indeed be worth highlighting as
a diagnostic. Cheers, Ian.

On 23 June 2017 at 18:51, Olivier Grisel <olivier.grisel at ensta.org> wrote:
> +1 for changing this example to have error bars represent 5 & 95
> percentiles or 25 and 75 percentiles (quartiles).
>
> Or event bootstrapped confidence intervals or the mean feature
> importance for each variable. This might be a bit too verbose for an
> example though.
>
>> Perhaps more importantly - is a visual
> indication of the spread of feature importances in an ensemble
> actually a useful thing to plot? Does it serve a diagnostic value?
>
> Yes. Otherwise people might be over-confident in the stability of
> those feature importances.
>
> --
> Olivier
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



-- 
Ian Ozsvald (Data Scientist, PyDataLondon co-chair)
ian at IanOzsvald.com

http://IanOzsvald.com
http://ModelInsight.io
http://twitter.com/IanOzsvald


More information about the scikit-learn mailing list