DecisionTreeClassifier

class pai4sk.DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=None, min_samples_leaf=1, max_features=None, random_state=None, verbose=False, use_gpu=False, hist_type=None, hist_nbins=None)

Decision Tree Classifier

This class implements a decision tree classifier using the IBM Snap ML library. It can be used for binary classification problems. It handles both dense and sparse matrix inputs. Use csr, csc or ndarray matrix format for training and csr or ndarray format for prediction.

Parameters:
  • criterion (string, optional, default : "gini") – This function measures the quality of a split. Possible values: “gini” and “entropy” for information gain. “entropy” is currently not supported.
  • splitter (string, optional, default : "best") – This parameter defines the strategy used to choose the split at each node. Possible values: “best” and “random”. “random” is currently not supported.
  • max_depth (int or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
  • min_samples_leaf (int or float, optional, default : 1) – The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it generates at least min_samples_leaf training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then consider ceil(min_samples_leaf * n_samples) as the minimum number.
  • max_features (int, float, string or None, optional, default : None) –
    The number of features to consider when looking for the best split:
    • If int, then consider max_features features at each split.
    • If float, then consider int(max_features * n_features) features at each split.
    • If “auto”, then max_features=sqrt(n_features).
    • If “sqrt”, then max_features=sqrt(n_features).
    • If “log2”, then max_features=log2(n_features).
    • If None, then max_features=n_features.
  • random_state (int, or None, optional, default : None) – If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.
  • use_gpu (bool, default : False) – Flag that indicates the hardware platform used for training. If True, the training is performed using the GPU. If False, the training is performed using the CPU. Currently only CPU training is supported.
  • hist_type (string, or None, optional, default : None) – Indicates whether to use histograms or not. Default is None and does not use histograms. - If ‘Auto’ or ‘Quantiles’ it will use histograms with Quantiles
  • hist_nbins (int, or None, optional, default : None) – Indicates the number of bins to be used in histogram mode. Only valid if hist_type is not None. Default value if hist_type is not None is 64.
  • verbose (bool, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.
Variables:
  • classes (array of shape = [n_classes]) – The classes labels (single output problem)
  • n_classes (int) – The number of classes (for single output problems)
fit(X_train, y_train)

Fit the model according to the given train data.

Parameters:
  • X_train (sparse matrix (csr_matrix, csc_matrix) or dense matrix (ndarray)) – Train dataset
  • y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.
Returns:

Return type:

None

get_params()

Get the values of the model parameters.

Returns:params
Return type:dict
predict(X, num_threads=0)

Class predictions

The returned class estimates.

Parameters:
  • X (sparse matrix (csr_matrix) or dense matrix (ndarray)) – Dataset used for predicting class estimates.
  • num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.
Returns:

pred – Returns the predicted class of the sample.

Return type:

array-like, shape = (n_samples,)

predict_log_proba(X, num_threads=0)

Log of probability estimates

The returned log-probability estimates for the two classes. Only for binary classification.

Parameters:
  • X (sparse matrix (csr_matrix) or dense matrix (ndarray)) – Dataset used for predicting log-probability estimates.
  • num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.
Returns:

Return type:

None

predict_proba(X, num_threads=0)

Probability estimates

The returned probability estimates for the two classes. Only for binary classification.

Parameters:
  • X (sparse matrix (csr_matrix) or dense matrix (ndarray)) – Dataset used for predicting probability estimates.
  • num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.
Returns:

Return type:

None