RandomForest

class pai4sk.RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_leaf=1, max_features='auto', bootstrap=True, n_jobs=None, random_state=None, verbose=False, use_gpu=False, use_histograms=False, hist_nbins=64)

Random Forest Classifier

This class implements a random forest classifier using the IBM Snap ML library. It can be used for binary classification problems.

Parameters
  • n_estimators (integer, optional, default : 10) – This parameter defines the number of trees in forest.

  • criterion (string, optional, default : "gini") – This function measures the quality of a split. The currently supported criterion is “gini”.

  • max_depth (integer or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.

  • min_samples_leaf (int or float, optional, default : 1) –

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a fraction and

    ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

  • max_features (int, float, string or None, optional, default : 'auto') –

    The number of features to consider when looking for the best split:
    • If int, then consider max_features features at each split.

    • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

    • If “auto”, then max_features=sqrt(n_features).

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None, then max_features=n_features.

  • bootstrap (boolean, optional, default : True) – This parameter determines whether bootstrap samples are used when building trees.

  • n_jobs (integer or None, optional, default : None) – The number of jobs to run in parallel the fit function. None = 1 process.

  • random_state (integer, or None, optional, default : None) – If integer, random_state is the seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random.

  • verbose (boolean, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.

  • use_gpu (boolean, default : False) – Flag that indicates the hardware platform used for training. If True, the training is performed using the GPU. If False, the training is performed using the CPU. Currently only CPU training is supported.

  • use_histograms (boolean, default : True) – Use histogram-based splits rather than exact splits.

  • hist_nbins (int, default : 64) – Number of histogram bins.

fit(X_train, y_train, sample_weight=None)

Fit the model according to the given train data.

Parameters
  • X_train (dense matrix (ndarray)) – Train dataset

  • y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.

  • sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.

Returns

self

Return type

object

get_params()

Get the values of the model parameters.

Returns

params

Return type

dict

predict(X, num_threads=0)

Class predictions

The returned class estimates.

Parameters
  • X (dense matrix (ndarray)) – Dataset used for predicting class estimates.

  • num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.

Returns

proba – Returns the predicted class of the sample.

Return type

array-like, shape = (n_samples,)

predict_log_proba(X, num_threads=0)

Log of probability estimates

The returned log-probability estimates for the two classes. Only for binary classification.

Parameters
  • X (dense matrix (ndarray)) – Dataset used for predicting log-probability estimates.

  • num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.

Returns

Return type

None

predict_proba(X, num_threads=0)

Probability estimates

The returned probability estimates for the two classes. Only for binary classification.

Parameters
  • X (dense matrix (ndarray)) – Dataset used for predicting probability estimates.

  • num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.

Returns

Return type

None

class pai4sk.RandomForestRegressor(n_estimators=10, criterion='mse', max_depth=None, min_samples_leaf=1, max_features='auto', bootstrap=True, n_jobs=None, random_state=None, verbose=False, use_gpu=False, use_histograms=False, hist_nbins=64)

Random Forest Regressor

This class implements a random forest regressor using the IBM Snap ML library. It can be used for regression tasks.

Parameters
  • n_estimators (integer, optional, default : 10) – This parameter defines the number of trees in forest.

  • criterion (string, optional, default : "mse") – This function measures the quality of a split. The currently supported criterion is “mse”.

  • max_depth (integer or None, optional, default : None) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_leaf samples.

  • min_samples_leaf (int or float, optional, default : 1) –

    The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_samples_leaf training samples in each of the left and right branches. - If int, then consider min_samples_leaf as the minimum number. - If float, then min_samples_leaf is a fraction and

    ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

  • max_features (int, float, string or None, optional, default : 'auto') –

    The number of features to consider when looking for the best split:
    • If int, then consider max_features features at each split.

    • If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.

    • If “auto”, then max_features=n_features.

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None, then max_features=n_features.

  • bootstrap (boolean, optional, default : True) – This parameter determines whether bootstrap samples are used when building trees.

  • n_jobs (integer or None, optional, default : None) – The number of jobs to run in parallel the fit function. None = 1 process.

  • random_state (integer, or None, optional, default : None) – If integer, random_state is the seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random.

  • verbose (boolean, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.

  • use_gpu (boolean, default : False) – Flag that indicates the hardware platform used for training. If True, the training is performed using the GPU. If False, the training is performed using the CPU. Currently only CPU training is supported.

  • use_histograms (boolean, default : True) – Use histogram-based splits rather than exact splits.

  • hist_nbins (int, default : 64) – Number of histogram bins.

fit(X_train, y_train, sample_weight=None)

Fit the model according to the given train data.

Parameters
  • X_train (dense matrix (ndarray)) – Train dataset

  • y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_traini.

  • sample_weight (array-like, shape = [n_samples] or None) – Sample weights. If None, then samples are equally weighted. TODO: Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.

Returns

self

Return type

object

get_params()

Get the values of the model parameters.

Returns

params

Return type

dict

predict(X, num_threads=0)

Regression predictions

The returned values estimates.

Parameters
  • X (dense matrix (ndarray)) – Dataset used for predicting class estimates.

  • num_threads (int, default : 0) – Number of threads used to run inference. By default inference runs with maximum number of available threads.

Returns

proba – Returns the predicted values of the samples.

Return type

array-like, shape = (n_samples,)