RandomForest¶

class pai4sk.RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_leaf=1, max_features='auto', bootstrap=True, n_jobs=None, random_state=None, verbose=False, use_histograms=False, hist_nbins=256, use_gpu=False, gpu_ids=[0])¶

Random Forest Classifier

This class implements a random forest classifier using the IBM Snap ML library. It can be used for binary classification problems.

Parameters

n_estimators (integer, optional, default : 10) – This parameter defines the number of trees in forest.
criterion (string, optional, default : "gini") – This function measures the quality of a split. The currently supported criterion is “gini”.
max_depth (integer or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
min_samples_leaf (int or float, optional, default : 1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a fraction and

ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
max_features (int, float, string or None, optional, default : 'auto') –
The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
- If “auto”, then max_features=sqrt(n_features).
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features=n_features.
bootstrap (boolean, optional, default : True) – This parameter determines whether bootstrap samples are used when building trees.
n_jobs (integer or None, optional, default : None) – The number of jobs to run in parallel the fit function. None = 1 process.
random_state (integer, or None, optional, default : None) – If integer, random_state is the seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random.
verbose (boolean, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.
use_histograms (boolean, default : False) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 256) – Number of histogram bins.
use_gpu (boolean, default : False) – Use GPU acceleration (only supported for histogram-based splits).
gpu_ids (array-like of int, default: [0]) – Device IDs of the GPUs which will be used when GPU acceleration is enabled.

fit(X_train, y_train, sample_weight=None)¶

Fit the model according to the given train data.

Parameters

X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.
sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.

Returns

self

Return type

object

get_params()¶

Get the values of the model parameters.

Returns: params
Return type: dict

predict(X, num_threads=0)¶

Class predictions

The returned class estimates.

Parameters

X (dense matrix (ndarray)) – Dataset used for predicting class estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.

Returns

proba – Returns the predicted class of the sample.

Return type

array-like, shape = (n_samples,)

class pai4sk.RandomForestRegressor(n_estimators=10, criterion='mse', max_depth=None, min_samples_leaf=1, max_features='auto', bootstrap=True, n_jobs=None, random_state=None, verbose=False, use_histograms=False, hist_nbins=256, use_gpu=False, gpu_ids=[0])¶

Random Forest Regressor

This class implements a random forest regressor using the IBM Snap ML library. It can be used for regression tasks.

Parameters

n_estimators (integer, optional, default : 10) – This parameter defines the number of trees in forest.
criterion (string, optional, default : "mse") – This function measures the quality of a split. The currently supported criterion is “mse”.
max_depth (integer or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.
min_samples_leaf (int or float, optional, default : 1) –
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a fraction and

ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
max_features (int, float, string or None, optional, default : 'auto') –
The number of features to consider when looking for the best split:
- If int, then consider max_features features at each split.
- If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
- If “auto”, then max_features=n_features.
- If “sqrt”, then max_features=sqrt(n_features).
- If “log2”, then max_features=log2(n_features).
- If None, then max_features=n_features.
bootstrap (boolean, optional, default : True) – This parameter determines whether bootstrap samples are used when building trees.
n_jobs (integer or None, optional, default : None) – The number of jobs to run in parallel the fit function. None = 1 process.
random_state (integer, or None, optional, default : None) – If integer, random_state is the seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random.
verbose (boolean, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.
use_histograms (boolean, default : False) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 256) – Number of histogram bins.
use_gpu (boolean, default : False) – Use GPU acceleration (only supported for histogram-based splits).
gpu_ids (array-like of int, default: [0]) – Device IDs of the GPUs which will be used when GPU acceleration is enabled.

fit(X_train, y_train, sample_weight=None)¶

Fit the model according to the given train data.

Parameters

X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_traini.
sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.

Returns

self

Return type

object

get_params()¶

Get the values of the model parameters.

Returns: params
Return type: dict

predict(X, num_threads=0)¶

Regression predictions

The returned values estimates.

Parameters

X (dense matrix (ndarray)) – Dataset used for predicting class estimates.
num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.

Returns

proba – Returns the predicted values of the samples.

Return type

array-like, shape = (n_samples,)