[scikit-learn] How to implement a callable distance metruc function in scikit-learn TSNE

Sihawi Khalid sihawi.khalid at eee.upd.edu.ph
Thu Feb 10 13:48:28 EST 2022


Hi all! I have been using the TSNE module in python’s sklearn to visualize
n-dimensional binary valued dataset using ‘hamming’ as the distance metric.
I plan on packing the n-dimensional dataset into bits in a uint8 array
using numpy’s ‘packbits’ function. To use the TSNE module on the packed
dataset, I would have to unpack the new dataset and use the ‘hamming’
distance metric as I have been doing in my original dataset. I however want
to write a simple callable function that will unpack just two instances in
the packed dataset and measure the hamming distance. In my initial attempt,
I used a lambda function as a metric which takes two input arrays, unack
them and outputs the hamming distance between the two. Here's an example
given 'myPackedDataset' is the packed dataset:

tsneCluster = TSNE(learning_rate=rate_val, perplexity=per_val,
n_iter=iter_val, metric=lambda x, y: distance.hamming(np.unpackbits(x),
np.unpackbits(y)))
fit_result = tsne.fit_transform(myPackedDataset)

I get this error at the TSNE declaration:

TypeError: Expected an input array of unsigned byte data type


I am still trying to understand how the callable function option of
the TSNE's metric parameter works but I would really appreciate any
help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20220211/924f73c4/attachment.html>


More information about the scikit-learn mailing list