[scikit-learn] KernelDensity bandwidth hyper parameter optimization
William Heymann
immudzen at gmail.com
Wed Nov 7 07:01:30 EST 2018
Hello,
I am trying to tune the bandwidth for my KernelDensity. I need to find out
what optimization goal to use.
I started with
from sklearn.grid_search import GridSearchCVgrid = GridSearchCV(KernelDensity(),
{'bandwidth': np.linspace(0.1, 1.0, 30)},
cv=20) # 20-fold cross-validationgrid.fit(x[:,
None])print grid.best_params_
From
https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/#Bandwidth-Cross-Validation-in-Scikit-Learn
I have also used RandomizedSearchCV to optimize the parameters.
The problem I have is that neither refines the answer so if I don't sample
at high enough density I don't get a good answer. What I would like to do
is use the same goal but put it into a different global optimizer.
I have looked through the code for GridSearchCV and RandomizedSearchCV and
I have not been able to figure out yet what is the actual optimization goal.
Originally I thought the system was using something like
kde_bw = KernelDensity(kernel='gaussian', bandwidth=bw)
score = max(cross_val_score(kde_bw, data, cv=3))
and then trying to minimize that score but it does not seem likely given
the results.
If someone could help me with the goal to optimize I should be able to
solve the rest of the problem on my own.
Thanks
Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20181107/cd5aa9e4/attachment-0001.html>
More information about the scikit-learn
mailing list