RandomForest

class pai4sk.RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_leaf=1, max_features='auto', bootstrap=True, n_jobs=None, random_state=None, verbose=False, use_histograms=False, hist_nbins=256, use_gpu=False, gpu_ids=[0])

Random Forest Classifier

This class implements a random forest classifier using the IBM Snap ML library. It can be used for binary classification problems.

Parameters
  • n_estimators (integer, optional, default : 10) – This parameter defines the number of trees in forest.

  • criterion (string, optional, default : "gini") – This function measures the quality of a split. The currently supported criterion is “gini”.

  • max_depth (integer or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.

  • min_samples_leaf (int or float, optional, default : 1) –

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a fraction and

    ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

  • max_features (int, float, string or None, optional, default : 'auto') –

    The number of features to consider when looking for the best split:
    • If int, then consider max_features features at each split.

    • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

    • If “auto”, then max_features=sqrt(n_features).

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None, then max_features=n_features.

  • bootstrap (boolean, optional, default : True) – This parameter determines whether bootstrap samples are used when building trees.

  • n_jobs (integer or None, optional, default : None) – The number of jobs to run in parallel the fit function. None = 1 process.

  • random_state (integer, or None, optional, default : None) – If integer, random_state is the seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random.

  • verbose (boolean, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.

  • use_histograms (boolean, default : False) – Use histogram-based splits rather than exact splits.

  • hist_nbins (int, default : 256) – Number of histogram bins.

  • use_gpu (boolean, default : False) – Use GPU acceleration (only supported for histogram-based splits).

  • gpu_ids (array-like of int, default: [0]) – Device IDs of the GPUs which will be used when GPU acceleration is enabled.

fit(X_train, y_train, sample_weight=None)

Fit the model according to the given train data.

Parameters
  • X_train (dense matrix (ndarray)) – Train dataset

  • y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.

  • sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.

Returns

self

Return type

object

get_params()

Get the values of the model parameters.

Returns

params

Return type

dict

predict(X, num_threads=0)

Class predictions

The returned class estimates.

Parameters
  • X (dense matrix (ndarray)) – Dataset used for predicting class estimates.

  • num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.

Returns

proba – Returns the predicted class of the sample.

Return type

array-like, shape = (n_samples,)

class pai4sk.RandomForestRegressor(n_estimators=10, criterion='mse', max_depth=None, min_samples_leaf=1, max_features='auto', bootstrap=True, n_jobs=None, random_state=None, verbose=False, use_histograms=False, hist_nbins=256, use_gpu=False, gpu_ids=[0])

Random Forest Regressor

This class implements a random forest regressor using the IBM Snap ML library. It can be used for regression tasks.

Parameters
  • n_estimators (integer, optional, default : 10) – This parameter defines the number of trees in forest.

  • criterion (string, optional, default : "mse") – This function measures the quality of a split. The currently supported criterion is “mse”.

  • max_depth (integer or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.

  • min_samples_leaf (int or float, optional, default : 1) –

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a fraction and

    ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

  • max_features (int, float, string or None, optional, default : 'auto') –

    The number of features to consider when looking for the best split:
    • If int, then consider max_features features at each split.

    • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

    • If “auto”, then max_features=n_features.

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None, then max_features=n_features.

  • bootstrap (boolean, optional, default : True) – This parameter determines whether bootstrap samples are used when building trees.

  • n_jobs (integer or None, optional, default : None) – The number of jobs to run in parallel the fit function. None = 1 process.

  • random_state (integer, or None, optional, default : None) – If integer, random_state is the seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random.

  • verbose (boolean, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.

  • use_histograms (boolean, default : False) – Use histogram-based splits rather than exact splits.

  • hist_nbins (int, default : 256) – Number of histogram bins.

  • use_gpu (boolean, default : False) – Use GPU acceleration (only supported for histogram-based splits).

  • gpu_ids (array-like of int, default: [0]) – Device IDs of the GPUs which will be used when GPU acceleration is enabled.

fit(X_train, y_train, sample_weight=None)

Fit the model according to the given train data.

Parameters
  • X_train (dense matrix (ndarray)) – Train dataset

  • y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_traini.

  • sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.

Returns

self

Return type

object

get_params()

Get the values of the model parameters.

Returns

params

Return type

dict

predict(X, num_threads=0)

Regression predictions

The returned values estimates.

Parameters
  • X (dense matrix (ndarray)) – Dataset used for predicting class estimates.

  • num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.

Returns

proba – Returns the predicted values of the samples.

Return type

array-like, shape = (n_samples,)