[scikit-learn] Sprint discussion points?
Andreas Mueller
t3kcit at gmail.com
Thu Feb 14 08:26:57 EST 2019
On 2/13/19 11:28 PM, Joel Nothman wrote:
> Convergence in logistic regression
> (https://github.com/scikit-learn/scikit-learn/issues/11536) is indeed
> one problem (and it presents a general issue of what max_iter means
> when you have several solvers, or how good defaults are selected). But
> I was sure we had problems with non-determinism on some platforms...
> but now can't find.
>
> > my students have basically no way to figure out what features the
> coefficients in their linear model correspond to, that seems a bit
> more important to me.
>
> Yes, I agree... Assuming coefficients are helpful, rather than using
> permutation-based measures of importance, for instance.
You would apply the permutation based feature importances before any
preprocessing? I guess there's a case to be made for either option.
I think there are good reasons to look at coefficients though.
> I generally think a review of distances might be a good thing at some
> point, given the confusing triplication across sklearn.neighbors,
> sklearn.metrics.pairwise, scipy.spatial... and that minkowski,p=2 is
> not implemented the same as euclidean.
>
Yes, I agree. I guess right now I'm more enthusiastic about new
features/APIs than decreasing technical debt, maybe because you're the
one dealing with the technical debt ;)
More information about the scikit-learn
mailing list