[scikit-learn] How do we define a distance metric's parameter for grid search

Hugo Ferreira hmf at inesctec.pt
Tue Jun 28 08:52:16 EDT 2016


Hi,

On 28-06-2016 12:45, Joel Nothman wrote:
> I tried to do this but am having errors. Seems like I need to use
> the 'metric_params' parameter but I cannot get it right. Here are
> some of the attempts I made:
>
> {'metric': ['wminkowski'], 'metric_params':[{ 'w': [0.01, 0.1, 1, 10,
> 100], 'p': [1,2,3,4,5]}], 'n_neighbors': list(k_range), 'weights':
> weights, 'algorithm': algos, 'leaf_size': list(leaf_sizes) }
>
> {'metric': ['wminkowski'], 'metric_params':[{ 'w': 0.01, 'p': 1}],
> 'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos,
> 'leaf_size': list(leaf_sizes) }
>
> {'metric': ['wminkowski'], 'metric_params':[dict(w=0.01,p=1)],
> 'n_neighbors': list(k_range), 'weights': weights, 'algorithm': algos,
> 'leaf_size': list(leaf_sizes) }
>
> The last two give me the following error:
>
> Exception ignored in: 'sklearn.neighbors.dist_metrics.get_vec_ptr'
> ValueError: Buffer has wrong number of dimensions (expected 1, got
> 0)
>
> Can anyone see what I am doing wrong?
>
>
> I can see *something* you're doing wrong. Firstly, your second and
> third examples produce identical Python objects.
>

Yeah. Its called desperation :-)

> But in metric_params, p should be an integer, w should be a
> 1-dimensional array. In your first example, both p and w will be 1d,
> and in your second and third, both are scalars. You want something
> like ... 'metric_params': [{'w': [0.01, 0.1, 1, 10, 100], 'p': 1}]
> ... except that those values for 'w' seem a bit strange for weights
> (are you sure you want wminkowski?).

Just testing the code. I'll need to learn what values are the most
appropriate here. Are these the weights to be applied to each feature
(number of weights = number of features)?

Wonder how I can use this during feature selection.

 > You can try multiple 'p' with 'metric_params':
> [{'w': weights, 'p': 1}, {'w': weights, 'p': 2}, {'w': weights, 'p':
> 3}, ...]
>
>

I have used the simplest case and set of parameters as follows (before
attempting multiple  parameters as you have shown above):

param_grid = [
{'metric': ['wminkowski'], 'metric_params':[{'w':[10, 20],'p':1}] }
]

and I get the error:

   File "<string>", line unknown
SyntaxError: invalid or missing encoding declaration for
'/home/hmf/my_py3/lib/python3.4/site-packages/sklearn/neighbors/ball_tree.cpython-34m.so'

Ok, so this may be due to the specific type of tree being used. I then
set the parameters to:

{'metric': ['wminkowski'], 'metric_params':[{'w':[10.0, 20.0],'p':1}],
'algorithm': algos }

where algos is:

algos      = ['brute']

Which results in the following error:

AttributeError: 'list' object has no attribute 'dtype'

So it seems we need to use an array explicitly. The following will work.

{'metric': ['wminkowski'], 'metric_params':[{'w':np.array([10.0,
20.0]),'p':1}], 'algorithm': algos }

Thanks for the help.

Hugo


More information about the scikit-learn mailing list