DecisionTree¶
-
class
pai4sk.
DecisionTreeClassifier
(criterion='gini', splitter='best', max_depth=None, min_samples_leaf=1, max_features=None, random_state=None, n_threads=1, use_histograms=False, hist_nbins=256, use_gpu=False, gpu_id=0, verbose=False)¶ Decision Tree Classifier
This class implements a decision tree classifier using the IBM Snap ML library. It can be used for binary classification problems.
- Parameters
criterion (string, optional, default : "gini") – This function measures the quality of a split. Possible values: “gini” and “entropy” for information gain. “entropy” is currently not supported.
splitter (string, optional, default : "best") – This parameter defines the strategy used to choose the split at each node. Possible values: “best” and “random”. “random” is currently not supported.
max_depth (int or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
min_samples_leaf (int or float, optional, default : 1) – The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it generates at least
min_samples_leaf
training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then consider ceil(min_samples_leaf * n_samples) as the minimum number.max_features (int, float, string or None, optional, default : None) –
- The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then consider int(max_features * n_features) features at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
random_state (int, or None, optional, default : None) – If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.
n_threads (integer, optional, default : 1) – The number of CPU threads to use.
use_histograms (boolean, default : False) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 256) – Number of histogram bins.
use_gpu (boolean, default : False) – Use GPU acceleration (only supported for histogram-based splits).
gpu_id (int, default: 0) – Device ID of the GPU which will be used when GPU acceleration is enabled.
- verbosebool, defaultFalse
If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.
- Variables
classes_ (array of shape = [n_classes]) – The classes labels (single output problem)
n_classes_ (int) – The number of classes (for single output problems)
-
fit
(X_train, y_train, sample_weight=None)¶ Fit the model according to the given train data.
- Parameters
X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.
sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.
- Returns
- Return type
-
predict
(X, num_threads=0)¶ Class predictions
The returned class estimates.
- Parameters
X (dense matrix (ndarray)) – Dataset used for predicting class estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.
- Returns
pred – Returns the predicted class of the sample.
- Return type
array-like, shape = (n_samples,)
-
predict_log_proba
(X, num_threads=0)¶ Log of probability estimates
The returned log-probability estimates for the two classes. Only for binary classification.
-
predict_proba
(X, num_threads=0)¶ Probability estimates
The returned probability estimates for the two classes. Only for binary classification.
-
class
pai4sk.
DecisionTreeRegressor
(criterion='mse', splitter='best', max_depth=None, min_samples_leaf=1, max_features=None, random_state=None, n_threads=1, use_histograms=False, hist_nbins=256, use_gpu=False, gpu_id=0, verbose=False)¶ Decision Tree Regressor
This class implements a decision tree regressor using the IBM Snap ML library. It can be used for regression tasks.
- Parameters
criterion (string, optional, default : "mse") – This function measures the quality of a split. Possible values: “mse” for mean squared error. “friedsman_mse” and “mae” are currently not supported.
splitter (string, optional, default : "best") – This parameter defines the strategy used to choose the split at each node. Possible values: “best” and “random”. “random” is currently not supported.
max_depth (int or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
min_samples_leaf (int or float, optional, default : 1) – The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it generates at least
min_samples_leaf
training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then consider ceil(min_samples_leaf * n_samples) as the minimum number.max_features (int, float, string or None, optional, default : None) –
- The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then consider int(max_features * n_features) features at each split.
If “auto”, then max_features=n_features.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
random_state (int, or None, optional, default : None) – If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.
n_threads (integer, optional, default : 1) – The number of CPU threads to use.
use_histograms (boolean, default : False) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 256) – Number of histogram bins.
use_gpu (boolean, default : False) – Use GPU acceleration (only supported for histogram-based splits).
gpu_id (int, default: 0) – Device ID of the GPU which will be used when GPU acceleration is enabled.
verbose (bool, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.
-
fit
(X_train, y_train, sample_weight=None)¶ Fit the model according to the given train data.
- Parameters
X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.
sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.
- Returns
- Return type
-
predict
(X, num_threads=0)¶ Regression predictions
The returned regression estimates.
- Parameters
X (dense matrix (ndarray)) – Dataset used for predicting regression estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.
- Returns
pred – Returns the predicted values of the samples.
- Return type
array-like, shape = (n_samples,)