[scikit-learn] How do we define a distance metric's parameter for grid search

Hugo Ferreira hmf at inesctec.pt
Tue Jun 28 07:03:11 EDT 2016


Hello,


On 27-06-2016 12:37, Joel Nothman wrote:
> Hi Hugo,
>
> Andrew's approach -- using a list of dicts to specify multiple parameter
> grids -- is the correct one.
>
> However, Andrew, you don't need to include parameters that will be
> ignored into your parameter grid. The following will be effectively the
> same:
>
> params =
> [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]},
> {'kernel':['rbf'],'gamma':[1/p,1,2]},
> {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1]}]
>

I tried to do this but am having errors. Seems like I need to use the
'metric_params' parameter but I cannot get it right. Here are some of 
the attempts I made:

{'metric': ['wminkowski'], 'metric_params':[{ 'w': [0.01, 0.1, 1, 10, 
100], 'p': [1,2,3,4,5]}], 'n_neighbors': list(k_range), 'weights': 
weights, 'algorithm': algos, 'leaf_size': list(leaf_sizes) }

{'metric': ['wminkowski'], 'metric_params':[{ 'w': 0.01, 'p': 1}], 
'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos, 
'leaf_size': list(leaf_sizes) }

{'metric': ['wminkowski'], 'metric_params':[dict(w=0.01,p=1)], 
'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos, 
'leaf_size': list(leaf_sizes) }

The last two give me the following error:

Exception ignored in: 'sklearn.neighbors.dist_metrics.get_vec_ptr'
ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

Can anyone see what I am doing wrong?

TIA,


> Joel
>
> On 27 June 2016 at 20:59, Andrew Howe <ahowe42 at gmail.com
> <mailto:ahowe42 at gmail.com>> wrote:
>
>     I did something similar where I was using GridSearchCV over
>     different kernel functions for SVM and not all kernel functions use
>     the same parameters.  For example, the *degree* parameter is only
>     used by the *poly* kernel.
>
>     from sklearn import svm
>     from sklearn import cross_validation
>     from sklearn import grid_search
>
>     params =
>     [{'kernel':['poly'],'degree':[1,2,3],'gamma':[1/p,1,2],'coef0':[-1,0,1]},\
>     {'kernel':['rbf'],'gamma':[1/p,1,2],'degree':[3],'coef0':[0]},\
>     {'kernel':['sigmoid'],'gamma':[1/p,1,2],'coef0':[-1,0,1],'degree':[3]}]
>     GSC = grid_search.GridSearchCV(estimator = svm.SVC(), param_grid =
>     params,\
>          cv = cvrand, n_jobs = -1)
>
>     This worked in this instance because the svm.SVC() object only
>     passes parameters to the kernel functions as needed:
>     Inline image 1
>
>     Hence, even though my list of dicts includes all three parameters
>     for all types of kernels I used, they were selectively ignored.  I'm
>     not sure about parameters for the distance metrics for the KNN
>     object, but it's a good bet it works the same way.
>
>     Andrew
>
>     <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>     J. Andrew Howe, PhD
>     Editor-in-Chief, European Journal of Mathematical Sciences
>     Executive Editor, European Journal of Pure and Applied Mathematics
>     www.andrewhowe.com <http://www.andrewhowe.com>
>     http://www.linkedin.com/in/ahowe42
>     https://www.researchgate.net/profile/John_Howe12/
>     I live to learn, so I can learn to live. - me
>     <~~~~~~~~~~~~~~~~~~~~~~~~~~~>
>
>     On Mon, Jun 27, 2016 at 1:27 PM, Hugo Ferreira <hmf at inesctec.pt
>     <mailto:hmf at inesctec.pt>> wrote:
>
>         Hello,
>
>         I have posted this question in Stackoverflow and did not get an
>         answer. This seems to be a basic usage question and am therefore
>         sending it here.
>
>         I have following code snippet that attempts to do a grid search
>         in which one of the grid parameters are the distance metrics to
>         be used for the KNN algorithm. The example below fails if I use
>         "wminkowski", "seuclidean" or "mahalanobis" distances metrics.
>
>         # Define the parameter values that should be searched
>         k_range    = range(1,31)
>         weights    = ['uniform' , 'distance']
>         algos      = ['auto', 'ball_tree', 'kd_tree', 'brute']
>         leaf_sizes = range(10, 60, 10)
>         metrics = ["euclidean", "manhattan", "chebyshev", "minkowski",
>         "mahalanobis"]
>
>         param_grid = dict(n_neighbors = list(k_range), weights =
>         weights, algorithm = algos, leaf_size = list(leaf_sizes),
>         metric=metrics)
>         param_grid
>
>         # Instantiate the algorithm
>         knn = KNeighborsClassifier(n_neighbors=10)
>
>         # Instantiate the grid
>         grid = GridSearchCV(knn, param_grid=param_grid, cv=10,
>         scoring='accuracy', n_jobs=-1)
>
>         # Fit the models using the grid parameters
>         grid.fit(X,y)
>
>         I assume this is because I have to set or define the ranges for
>         the various distance parameters (for example p, w for
>         “wminkowski” - WMinkowskiDistance ). The "minkowski" distance
>         may be working because its "p" parameter has the default 2.
>
>         So my questions are:
>
>         1. Can we set the range of parameters for the distance metrics
>         for the grid search and if so how?
>         2. Can we set the value of a parameters for the distance metrics
>         for the grid search and if so how?
>
>         Hope the question is clear.
>         TIA
>         _______________________________________________
>         scikit-learn mailing list
>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>         https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>



More information about the scikit-learn mailing list