SnapBoost¶

class pai4sk.BoostingMachine(objective='mse', num_round=10, min_max_depth=1, max_max_depth=6, n_threads=1, random_state=None, learning_rate=0.1, use_histograms=True, hist_nbins=64, use_gpu=False, gpu_id=0, verbose=False, colsample_bytree=1.0, subsample=1.0, parallel_by_example=False, lambda_l2=0.0)¶

This class implements a boosting machine that can be used to construct an ensemble of decision trees. It can be used for both classification and regression tasks. In contrast to other boosting frameworks, Snap ML’s boosting machine does not utilize a fixed maximal tree depth at each boosting iteration. Instead, the tree depth is sampled at each boosting iteration according to a discrete uniform distribution. The fit and predict functions accept numpy.ndarray data structures.

Parameters

objective (string, optional, default : "mse") – The training objective (“mse” or “logloss”). “mse” is typically used for regression tasks, and “logloss” for classification tasks.
num_round (integer, optional, default : 10) – The number of boosting iterations.
min_max_depth (integer, optional, default : 1) – Minimum max_depth of a tree in the ensemble.
max_max_depth (integer, optional, default : 6) – Maximum max_depth of a tree in the ensemble.
n_threads (integer, optional, default : 1) – The number of CPU threads to use.
random_state (integer, or None, optional, default : None) – If integer, random_state is the seed used by the random number generator. If None, the random number generator is the RandomState instance used by np.random.
learning_rate (float, optional, default : 0.1) – Learning rate in the boosting algorithm.
use_histograms (boolean, default : True) – Use histogram-based splits rather than exact splits.
hist_nbins (int, default : 64) – Number of histogram bins.
use_gpu (boolean, default : False) – Use GPU acceleration (only supported for histogram-based splits).
gpu_id (int, default: 0) – Device ID of the GPU which will be used when GPU acceleration is enabled.
verbose (boolean, default : False) – If True, it prints debugging information while training. Warning: this will increase the training time. For performance evaluation, use verbose=False.
colsample_bytree (float, default : 1.0) – The fraction of features to be subsampled at each boosting iteration. The value range of this parameter is (0,1].
subsample (float, default : 1.0) – The fraction of examples to be subsampled at each boosting iteration. The value range of this parameter is (0,1].
parallel_by_example (boolean, default : False) – If True, computation of histogram bins will be parallelized by example (rather than by feature). This option may reduce training time for datatsets with a million or more examples. Only relevant for CPU-based training.
lambda_l2 (float, default : 0.0) – L2-regularization parameter applied to tree weights.

fit(X_train, y_train)¶

Fit the model according to the given train data.

Parameters

X_train (dense matrix (ndarray)) – Train dataset
y_train (array-like, shape = (n_samples,)) – The target vector corresponding to X_train.

Returns

self

Return type

object

get_params()¶

Get the values of the model parameters.

Returns: params
Return type: dict

predict(X)¶

Raw predictions

If the training objective is ‘mse’ then it returns the predicted estimates. If the training objective is ‘logloss’ then it returns the predicted estimates before the logistic transformation (raw logits).

Parameters: X (dense matrix (ndarray)) – Dataset used for predicting class estimates.
Returns: proba – Returns the predicted estimates.
Return type: array-like, shape = (n_samples,)

predict_proba(X)¶

Output probabilities

Use only if the training objective is ‘logloss’ (i.e., for binary classification problems). It returns the probabilities of each sample belonging to each class. The probabilities are calculated using the logistic transformation.

Parameters: X (dense matrix (ndarray)) – Dataset used for predicting class estimates.
Returns: proba – Returns the predicted probabilities of each sample belonging to each class.
Return type: array-like, shape = (n_samples, 2)