RandomForest¶
-
class
pai4sk.
RandomForestClassifier
(n_estimators=10, criterion='gini', max_depth=None, min_samples_leaf=1, max_features='auto', bootstrap=True, n_jobs=None, random_state=None, verbose=False, use_histograms=False, hist_nbins=256, use_gpu=False, gpu_ids=[0])¶ Random Forest Classifier
This class implements a random forest classifier using the IBM Snap ML library. It can be used for binary classification problems.
- Parameters
n_estimators (integer, optional, default : 10) – This parameter defines the number of trees in forest.
criterion (string, optional, default : "gini") – This function measures the quality of a split. The currently supported criterion is “gini”.
max_depth (integer or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
min_samples_leaf (int or float, optional, default : 1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a fraction andceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
max_features (int, float, string or None, optional, default : 'auto') –
- The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
bootstrap (boolean, optional, default : True) – This parameter determines whether bootstrap samples are used when building trees.
n_jobs (integer or None, optional, default : None) – The number of jobs to run in parallel the fit function. None = 1 process.
random_state (integer, or None, optional, default : None) – If integer, random_state is the seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random.
verbose (boolean, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.
use_histograms (boolean, default : False) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 256) – Number of histogram bins.
use_gpu (boolean, default : False) – Use GPU acceleration (only supported for histogram-based splits).
gpu_ids (array-like of int, default: [0]) – Device IDs of the GPUs which will be used when GPU acceleration is enabled.
-
fit
(X_train, y_train, sample_weight=None)¶ Fit the model according to the given train data.
- Parameters
X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.
sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.
- Returns
self
- Return type
-
predict
(X, num_threads=0)¶ Class predictions
The returned class estimates.
- Parameters
X (dense matrix (ndarray)) – Dataset used for predicting class estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.
- Returns
proba – Returns the predicted class of the sample.
- Return type
array-like, shape = (n_samples,)
-
class
pai4sk.
RandomForestRegressor
(n_estimators=10, criterion='mse', max_depth=None, min_samples_leaf=1, max_features='auto', bootstrap=True, n_jobs=None, random_state=None, verbose=False, use_histograms=False, hist_nbins=256, use_gpu=False, gpu_ids=[0])¶ Random Forest Regressor
This class implements a random forest regressor using the IBM Snap ML library. It can be used for regression tasks.
- Parameters
n_estimators (integer, optional, default : 10) – This parameter defines the number of trees in forest.
criterion (string, optional, default : "mse") – This function measures the quality of a split. The currently supported criterion is “mse”.
max_depth (integer or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
min_samples_leaf (int or float, optional, default : 1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a fraction andceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
max_features (int, float, string or None, optional, default : 'auto') –
- The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=n_features.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
bootstrap (boolean, optional, default : True) – This parameter determines whether bootstrap samples are used when building trees.
n_jobs (integer or None, optional, default : None) – The number of jobs to run in parallel the fit function. None = 1 process.
random_state (integer, or None, optional, default : None) – If integer, random_state is the seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random.
verbose (boolean, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.
use_histograms (boolean, default : False) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 256) – Number of histogram bins.
use_gpu (boolean, default : False) – Use GPU acceleration (only supported for histogram-based splits).
gpu_ids (array-like of int, default: [0]) – Device IDs of the GPUs which will be used when GPU acceleration is enabled.
-
fit
(X_train, y_train, sample_weight=None)¶ Fit the model according to the given train data.
- Parameters
X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_traini.
sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.
- Returns
self
- Return type
-
predict
(X, num_threads=0)¶ Regression predictions
The returned values estimates.
- Parameters
X (dense matrix (ndarray)) – Dataset used for predicting class estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.
- Returns
proba – Returns the predicted values of the samples.
- Return type
array-like, shape = (n_samples,)