load_svmlight_file

pai4sk.sml_io.load_svmlight_file(filename, num_ft=None, num_chunks=None)

Data loading from svmlight format file. It supports both local and distributed(MPI) method of loading data. For MPI execution this can be used for distributed SnapML training and inference.

Parameters
  • filename (str) – The file where the data resides.

  • num_ft (int) – Expected number of features

  • num_chunks (int) – Number of chunks per partition

Returns

  • X (scipy.sparse matrix of shape (n_samples, n_features))

  • y (ndarray of shape (n_samples,))