DecisionTree¶

class pai4sk.DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=None, min_samples_leaf=1, max_features=None, random_state=None, n_threads=1, use_histograms=False, hist_nbins=256, use_gpu=False, gpu_id=0, verbose=False)¶

Decision Tree Classifier

This class implements a decision tree classifier using the IBM Snap ML library. It can be used for binary classification problems.

Parameters

criterion (string, optional, default : "gini") – This function measures the quality of a split. Possible values: “gini” and “entropy” for information gain. “entropy” is currently not supported.
splitter (string, optional, default : "best") – This parameter defines the strategy used to choose the split at each node. Possible values: “best” and “random”. “random” is currently not supported.
max_depth (int or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
min_samples_leaf (int or float, optional, default : 1) – The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it generates at least min_samples_leaf training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then consider ceil(min_samples_leaf * n_samples) as the minimum number.
max_features (int, float, string or None, optional, default : None) –
The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then consider int(max_features * n_features) features at each split.
- If “auto”, then max_features=sqrt(n_features).
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features=n_features.
random_state (int, or None, optional, default : None) – If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.
n_threads (integer, optional, default : 1) – The number of CPU threads to use.
use_histograms (boolean, default : False) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 256) – Number of histogram bins.
use_gpu (boolean, default : False) – Use GPU acceleration (only supported for histogram-based splits).
gpu_id (int, default: 0) – Device ID of the GPU which will be used when GPU acceleration is enabled.

verbosebool, defaultFalse: If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.

Variables

classes_ (array of shape = [n_classes]) – The classes labels (single output problem)
n_classes_ (int) – The number of classes (for single output problems)

fit(X_train, y_train, sample_weight=None)¶

Fit the model according to the given train data.

Parameters

X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.
sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.

Returns

Return type

None

get_params()¶

Get the values of the model parameters.

Returns: params
Return type: dict

predict(X, num_threads=0)¶

Class predictions

The returned class estimates.

Parameters

X (dense matrix (ndarray)) – Dataset used for predicting class estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.

Returns

pred – Returns the predicted class of the sample.

Return type

array-like, shape = (n_samples,)

predict_log_proba(X, num_threads=0)¶

Log of probability estimates

The returned log-probability estimates for the two classes. Only for binary classification.

Parameters

X (dense matrix (ndarray)) – Dataset used for predicting log-probability estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.

Returns

Return type

None

predict_proba(X, num_threads=0)¶

Probability estimates

The returned probability estimates for the two classes. Only for binary classification.

Parameters

X (dense matrix (ndarray)) – Dataset used for predicting probability estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.

Returns

Return type

None

class pai4sk.DecisionTreeRegressor(criterion='mse', splitter='best', max_depth=None, min_samples_leaf=1, max_features=None, random_state=None, n_threads=1, use_histograms=False, hist_nbins=256, use_gpu=False, gpu_id=0, verbose=False)¶

Decision Tree Regressor

This class implements a decision tree regressor using the IBM Snap ML library. It can be used for regression tasks.

Parameters

criterion (string, optional, default : "mse") – This function measures the quality of a split. Possible values: “mse” for mean squared error. “friedsman_mse” and “mae” are currently not supported.
splitter (string, optional, default : "best") – This parameter defines the strategy used to choose the split at each node. Possible values: “best” and “random”. “random” is currently not supported.
max_depth (int or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
min_samples_leaf (int or float, optional, default : 1) – The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it generates at least min_samples_leaf training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then consider ceil(min_samples_leaf * n_samples) as the minimum number.
max_features (int, float, string or None, optional, default : None) –
The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then consider int(max_features * n_features) features at each split.
- If “auto”, then max_features=n_features.
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features=n_features.
random_state (int, or None, optional, default : None) – If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.
n_threads (integer, optional, default : 1) – The number of CPU threads to use.
use_histograms (boolean, default : False) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 256) – Number of histogram bins.
use_gpu (boolean, default : False) – Use GPU acceleration (only supported for histogram-based splits).
gpu_id (int, default: 0) – Device ID of the GPU which will be used when GPU acceleration is enabled.
verbose (bool, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.

fit(X_train, y_train, sample_weight=None)¶

Fit the model according to the given train data.

Parameters

X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.
sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.

Returns

Return type

None

get_params()¶

Get the values of the model parameters.

Returns: params
Return type: dict

predict(X, num_threads=0)¶

Regression predictions

The returned regression estimates.

Parameters

X (dense matrix (ndarray)) – Dataset used for predicting regression estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.

Returns

pred – Returns the predicted values of the samples.

Return type

array-like, shape = (n_samples,)