RandomForest¶
-
class
pai4sk.
RandomForestClassifier
(n_estimators=10, criterion='gini', max_depth=None, min_samples_leaf=1, max_features='auto', bootstrap=True, n_jobs=None, random_state=None, verbose=False, use_gpu=False, use_histograms=False, hist_nbins=64)¶ Random Forest Classifier
This class implements a random forest classifier using the IBM Snap ML library. It can be used for binary classification problems.
- Parameters
n_estimators (integer, optional, default : 10) – This parameter defines the number of trees in forest.
criterion (string, optional, default : "gini") – This function measures the quality of a split. The currently supported criterion is “gini”.
max_depth (integer or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
min_samples_leaf (int or float, optional, default : 1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a fraction andceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
max_features (int, float, string or None, optional, default : 'auto') –
- The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
bootstrap (boolean, optional, default : True) – This parameter determines whether bootstrap samples are used when building trees.
n_jobs (integer or None, optional, default : None) – The number of jobs to run in parallel the fit function. None = 1 process.
random_state (integer, or None, optional, default : None) – If integer, random_state is the seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random.
verbose (boolean, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.
use_gpu (boolean, default : False) – Flag that indicates the hardware platform used for training. If True, the training is performed using the GPU. If False, the training is performed using the CPU. Currently only CPU training is supported.
use_histograms (boolean, default : True) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 64) – Number of histogram bins.
-
fit
(X_train, y_train, sample_weight=None)¶ Fit the model according to the given train data.
- Parameters
X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.
sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.
- Returns
self
- Return type
-
predict
(X, num_threads=0)¶ Class predictions
The returned class estimates.
- Parameters
X (dense matrix (ndarray)) – Dataset used for predicting class estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.
- Returns
proba – Returns the predicted class of the sample.
- Return type
array-like, shape = (n_samples,)
-
predict_log_proba
(X, num_threads=0)¶ Log of probability estimates
The returned log-probability estimates for the two classes. Only for binary classification.
-
predict_proba
(X, num_threads=0)¶ Probability estimates
The returned probability estimates for the two classes. Only for binary classification.
-
class
pai4sk.
RandomForestRegressor
(n_estimators=10, criterion='mse', max_depth=None, min_samples_leaf=1, max_features='auto', bootstrap=True, n_jobs=None, random_state=None, verbose=False, use_gpu=False, use_histograms=False, hist_nbins=64)¶ Random Forest Regressor
This class implements a random forest regressor using the IBM Snap ML library. It can be used for regression tasks.
- Parameters
n_estimators (integer, optional, default : 10) – This parameter defines the number of trees in forest.
criterion (string, optional, default : "mse") – This function measures the quality of a split. The currently supported criterion is “mse”.
max_depth (integer or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
min_samples_leaf (int or float, optional, default : 1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a fraction andceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
max_features (int, float, string or None, optional, default : 'auto') –
- The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=n_features.
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
bootstrap (boolean, optional, default : True) – This parameter determines whether bootstrap samples are used when building trees.
n_jobs (integer or None, optional, default : None) – The number of jobs to run in parallel the fit function. None = 1 process.
random_state (integer, or None, optional, default : None) – If integer, random_state is the seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random.
verbose (boolean, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.
use_gpu (boolean, default : False) – Flag that indicates the hardware platform used for training. If True, the training is performed using the GPU. If False, the training is performed using the CPU. Currently only CPU training is supported.
use_histograms (boolean, default : True) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 64) – Number of histogram bins.
-
fit
(X_train, y_train, sample_weight=None)¶ Fit the model according to the given train data.
- Parameters
X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_traini.
sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.
- Returns
self
- Return type
-
predict
(X, num_threads=0)¶ Regression predictions
The returned values estimates.
- Parameters
X (dense matrix (ndarray)) – Dataset used for predicting class estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.
- Returns
proba – Returns the predicted values of the samples.
- Return type
array-like, shape = (n_samples,)