metrics.pairwise.word_movers_distance

pai4sk.metrics.pairwise.word_movers_distance(W, X, Y=None, use_gpu=True, optimization_level=1, num_cpu_threads=None, use_cosine=False)

Computes Word Mover’s Distance between samples in X and Y given embedding vectors W.

The details of the approximation algorithm can be found here: “http://export.arxiv.org/abs/1812.02091”.

Parameters
  • W (ndarray, shape: (n_features, n_dimensions)) – Embedding vectors

  • X (array-like, sparse-matrix, shape (n_samples_x, n_features)) – Input dataset. When MPI is used, it is preferred to have X of larger size than Y when X and Y are of unequal sizes as the MPI distribution of data is performed for X only.

  • Y (array-like, sparse-matrix, shape (n_samples_y, n_features), optional) – Input dataset. If None, the output will be the pairwise similarities between all samples in X.

  • use_gpu (boolean, when false uses the CPU, when true uses the GPU.) –

  • optimization_level (integer, mimimum is 0 and maximum is 4. Using a higher value increases runtime.) –

  • num_cpu_threads (integer, maximum number of CPU threads used by each process.) –

  • use_cosine (boolean, when false computes Euclidean ground distances between the embedding vectors,) – when true computes (1 - Cosine Similarity) after normalizing the embedding vectors.

Returns

D – Word Movers Distance

Return type

array-like, shape (n_samples_x, n_samples_y)