loaders.load_20News¶
-
pai4sk.simsearch.loaders.
load_20News
(directory_path, vocabulary_file_path='vocabulary.txt', norm='l1')¶ Loads 20News dataset into arrays X, labels, ids
Parameters: - directory_path (string) – Source location for the data
- vocabulary_file_path (string) – Vocabulary that is used in tokenization and vectorization of text
- norm (string (default='l1')) – l2 if the user wants to normalize the data using l2 normalization
Returns: - X (array-like, sparse_matrix, shape (n_samples, n_features)) – Feature vectors
- labels (array-like, shape (n_samples,)) – labels are the class labels of the samples in X
- ids (array_like, shape (n_samples,)) – ids are the ids of the samples in X