loaders.load_20News¶

pai4sk.simsearch.loaders.load_20News(directory_path, vocabulary_file_path='vocabulary.txt', norm='l1')¶

Loads 20News dataset into arrays X, labels, ids

Parameters:

directory_path (string) – Source location for the data
vocabulary_file_path (string) – Vocabulary that is used in tokenization and vectorization of text
norm (string (default='l1')) – l2 if the user wants to normalize the data using l2 normalization

Returns:

X (array-like, sparse_matrix, shape (n_samples, n_features)) – Feature vectors
labels (array-like, shape (n_samples,)) – labels are the class labels of the samples in X
ids (array_like, shape (n_samples,)) – ids are the ids of the samples in X