DecisionTree¶
-
class
pai4sk.
DecisionTreeClassifier
(criterion='gini', splitter='best', max_depth=None, min_samples_leaf=1, max_features=None, random_state=None, verbose=False, use_gpu=False, use_histograms=False, hist_nbins=64)¶ Decision Tree Classifier
This class implements a decision tree classifier using the IBM Snap ML library. It can be used for binary classification problems.
- Parameters
criterion (string, optional, default : "gini") – This function measures the quality of a split. Possible values: “gini” and “entropy” for information gain. “entropy” is currently not supported.
splitter (string, optional, default : "best") – This parameter defines the strategy used to choose the split at each node. Possible values: “best” and “random”. “random” is currently not supported.
max_depth (int or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
min_samples_leaf (int or float, optional, default : 1) – The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it generates at least
min_samples_leaf
training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then consider ceil(min_samples_leaf * n_samples) as the minimum number.max_features (int, float, string or None, optional, default : None) –
- The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then consider int(max_features * n_features) features at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
random_state (int, or None, optional, default : None) – If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.
use_gpu (bool, default : False) – Flag that indicates the hardware platform used for training. If True, the training is performed using the GPU. If False, the training is performed using the CPU. Currently only CPU training is supported.
use_histograms (boolean, default : True) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 64) – Number of histogram bins.
- verbosebool, defaultFalse
If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.
- Variables
classes_ (array of shape = [n_classes]) – The classes labels (single output problem)
n_classes_ (int) – The number of classes (for single output problems)
-
fit
(X_train, y_train, sample_weight=None)¶ Fit the model according to the given train data.
- Parameters
X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.
sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.
- Returns
- Return type
-
predict
(X, num_threads=0)¶ Class predictions
The returned class estimates.
- Parameters
X (dense matrix (ndarray)) – Dataset used for predicting class estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.
- Returns
pred – Returns the predicted class of the sample.
- Return type
array-like, shape = (n_samples,)
-
predict_log_proba
(X, num_threads=0)¶ Log of probability estimates
The returned log-probability estimates for the two classes. Only for binary classification.
-
predict_proba
(X, num_threads=0)¶ Probability estimates
The returned probability estimates for the two classes. Only for binary classification.
-
class
pai4sk.
DecisionTreeRegressor
(criterion='mse', splitter='best', max_depth=None, min_samples_leaf=1, max_features=None, random_state=None, verbose=False, use_gpu=False, use_histograms=False, hist_nbins=64)¶ Decision Tree Regressor
This class implements a decision tree regressor using the IBM Snap ML library. It can be used for regression tasks.
- Parameters
criterion (string, optional, default : "mse") – This function measures the quality of a split. Possible values: “mse” for mean squared error. “friedsman_mse” and “mae” are currently not supported.
splitter (string, optional, default : "best") – This parameter defines the strategy used to choose the split at each node. Possible values: “best” and “random”. “random” is currently not supported.
max_depth (int or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
min_samples_leaf (int or float, optional, default : 1) – The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it generates at least
min_samples_leaf
training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then consider ceil(min_samples_leaf * n_samples) as the minimum number.max_features (int, float, string or None, optional, default : None) –
- The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then consider int(max_features * n_features) features at each split.
If “auto”, then max_features=n_features.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
random_state (int, or None, optional, default : None) – If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.
use_gpu (bool, default : False) – Flag that indicates the hardware platform used for training. If True, the training is performed using the GPU. If False, the training is performed using the CPU. Currently only CPU training is supported.
use_histograms (boolean, default : True) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 64) – Number of histogram bins.
verbose (bool, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.
-
fit
(X_train, y_train, sample_weight=None)¶ Fit the model according to the given train data.
- Parameters
X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.
sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.
- Returns
- Return type
-
predict
(X, num_threads=0)¶ Regression predictions
The returned regression estimates.
- Parameters
X (dense matrix (ndarray)) – Dataset used for predicting regression estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.
- Returns
pred – Returns the predicted values of the samples.
- Return type
array-like, shape = (n_samples,)