pai4sk.simsearch.evaluate.evaluate_topk(D, K, labels_X, labels_Y=None)

Evaluates search accuracy for the top K samples

  • D (ndarray, shape (n_samples_x, n_samples_y)) – A two-dimensional distance matrix with distances between two datasets (X and Y)
  • K (int) – Number of top samples for which we want to calculate the accuracy of the similarity search algorithm
  • labels_X (array-like, shape (n_samples_x,)) – Labels corresponding to dataset X
  • labels_Y (array-like, shape (n_samples_y,)) – Labels corresponding to dataset Y


labels_Y = None if we want to evaluate the precision of the similarity search algorithm for documents/images within a
single dataset
  • k_vec (ndarray, shape (log(K)+1,)) – Indicates the top-K number for which we are calculating the precision values (in powers of 2: 1,2,4,8,16,32 if the value of input argument K is 32)
  • prec_vec (ndarray, shape (log(K)+1,)) – Indicates the corresponding precision values
  • topk_indices (ndarray, shape (n_samples_x, K)) – Indicates the number K for which we store the precision values
  • topk_values (ndarray, shape (n_samples_x, K)) – Indicates the precision values corresponding to the topk_indices