[scikit-learn] Sprint discussion points?

Thu Feb 14 08:26:57 EST 2019

On 2/13/19 11:28 PM, Joel Nothman wrote:
> Convergence in logistic regression 
> (https://github.com/scikit-learn/scikit-learn/issues/11536) is indeed 
> one problem (and it presents a general issue of what max_iter means 
> when you have several solvers, or how good defaults are selected). But 
> I was sure we had problems with non-determinism on some platforms... 
> but now can't find.
>
> > my students have basically no way to figure out what features the 
> coefficients in their linear model correspond to, that seems a bit 
> more important to me.
>
> Yes, I agree... Assuming coefficients are helpful, rather than using 
> permutation-based measures of importance, for instance.

You would apply the permutation based feature importances before any 
preprocessing? I guess there's a case to be made for either option.
I think there are good reasons to look at coefficients though.

> I generally think a review of distances might be a good thing at some 
> point, given the confusing triplication across sklearn.neighbors, 
> sklearn.metrics.pairwise, scipy.spatial... and that minkowski,p=2 is 
> not implemented the same as euclidean.
>
Yes, I agree. I guess right now I'm more enthusiastic about new 
features/APIs than decreasing technical debt, maybe because you're the 
one dealing with the technical debt ;)