Manual¶

The Snap Machine Learning (Snap ML) Library is designed to offer fast training of generalized linear models and tree based models. The library is under development and currently supports the following machine learning models:

Linear Regression
Support Vector Machine
Logistic Regression
Decision Tree
Random Forest

Linear Regression¶

LinearRegression fits a linear model with coefficients ${\bf w} = (w_1, ..., w_d)$ to minimize the residual sum of squares between the predicted responses $\bf \hat{y}$ and the true labels $\bf y$ of the training data.

In order to prevent the model from overfitting you have the option to impose an $L_1$ or an $L_2$ norm penalty on the size of the coefficients.

Ridge Regression adds an $L_2$ -regularization term to the least-squares loss. Mathematically it solves the following optimization problem:

$\underset{{\bf w}}{\min\,} \frac 1 {2}|| X^\top {\bf w} - {\bf y}||_2^2 + \frac{\lambda} 2 \|\bf w\|_2^2$
Lasso adds an $L_1$ -regularization term to the least-squares loss. Mathematically it solves the following optimization problem:

$\underset{{\bf w}}{\min\,} \frac 1 {2}|| X^\top {\bf w} - {\bf y}||_2^2 + {\lambda} \|\bf w\|_1$

In both cases $X=[{\bf x}_1,...,{\bf x}_n]$ denotes the training data matrix with samples $\{{\bf x}_i\}_{i\in [n]}$ in its columns and $y_i$ are the corresponding labels. The regularization strength is controlled by the regularization parameter $\lambda\geq 0$ ; the larger $\lambda$ the more robust the model becomes to overfitting.

Note

(Regularization Parameter) In order to find an appropriate regularization parameter we recommend to perform cross validation.

Note

(Feature Selection) As Lasso regression yields sparse models it can be used to perform feature selection. The sparsity can be controlled by $\lambda$ - a larger regularization parameter encourages more sparsity.

Snap ML implements different variants of stochastic coordinate descent [SCD] and stochastic dual coordinate ascent [SDCA] as an algorithm to fit the model parameters $\bf w$ . In order to optimally support GPUs for training Snap ML implements a parallel asynchronous version of these solvers especially designed to leverage the massive parallelism of modern GPUs [TPASCD].

To train the LinearRegression model the fit method is used; it takes the training data and the labels $X,\bf y$ as input and stores the learnt coefficients of the model $\bf w$ in its coef_ member function. The regularization type can be specified during initialization using the penalty argument. The trained model can then be used to make predictions by calling the predict method on unlabelled data.

>>> from pai4sk import LinearRegression
>>> import numpy as np

>>> reg = LinearRegression(max_iter = 100, regularizer = 0.1, penalty = 'l2')
>>> X_train = np.array([[0, 0], [1, 1], [2, 2]])
>>> y_train = np.array([0, 1, 2])
>>> reg.fit (X_train, y_train)
>>> reg.coef_
[ 0.495,  0.495]
>>> X_test = np.array([[3, 3], [0, 1]])
>>> reg.predict(X_test)
[2.97, 0.495]

For more details about the API we refer to the snap-ml API.

Support Vector Machine¶

Support Vector Machine (SVM) is a supervised learning method which can be applied for regression as well as classification. Currently Snap ML implements SupportVectorMachine (SVMs) with a linear kernel function and offers $L_2$ regularization to prevent the model from overfitting.

Mathematically it solves the following optimization problem:

$\underset{{\bf w}}{\min\,} \sum_i [{\bf x}_i^\top {\bf w} - y_i]_+ +\frac \lambda 2 \|{\bf w}\|_2^2$

where $[u]_+=\max(u,0)$ denotes the hinge loss with $\{x_i\}_{i\in [n]}$ being the training samples and $y_i \in \{\pm 1\}$ the corresponding labels. The regularization strength $\lambda>0$ can be controlled by the user through the regularizer parameter. The larger $\lambda$ the more robust the model becomes to overfitting.

Snap ML implements stochastic dual coordinate [SDCA] and the GPU optimized [TPASCD] as an algorithm to train the SVM classifier. SDCA runs on the equivalent SVM dual problem formulation:

$\underset{\boldsymbol \alpha}{\min\,} \sum_i -\alpha_i y_i +\frac 1 {2\lambda} \|X \boldsymbol \alpha\|_2^2$

with the constraint $\alpha_i y_i\in[0,1]$ .

To train the model the fit method is used; it takes the training data and the labels $X,{\bf y}$ as input and stores the learnt coefficients of the model ${\bf w}$ in its coef_ member function. The trained model can then be used to make predictions by calling the predict method on unlabelled data.

>>> from pai4sk import SupportVectorMachine
>>> import numpy as np

>>> reg = SupportVectorMachine(max_iter = 100, regularizer = 1.0)
>>> X_train = np.array([[0, 0], [1, 1], [2, 2]])
>>> y_train = np.array([-1,-1,1])
>>> reg.fit (X_train, y_train)
>>> reg.coef_
[ 0.25,  0.25]
>>> X_test = np.array([[3, 3], [0, 1]])
>>> reg.predict(X_test)
[1,-1]

A full example of training a SupportVectorMachine model in a real application can be found in the PowerAI distribution under /opt/DL/snap-ml-local/examples. For more details about the API we refer to the snap-ml API.

Logistic Regression¶

LogisticRegression is a linear model for classification. A logistic model is used to estimate the probability of an outcome based on the features of the input data. In order to prevent the model from overfitting $L_2$ or $L_1$ regularization can be used.

Mathematically $L_2$ -regularized Logistic Regression solves the following optimization problem composing of the logistic loss and an $L_2$ regularization term:

$\underset{{\bf w}}{\min\,} \sum_i \log(1+\exp(-y_i {\bf x}_i^\top {\bf w})) + \frac \lambda 2 \|{\bf w}\|_2^2$

Similarly, $L_1$ -regularized Logistic Regression solves the following optimization problem:

$\underset{{\bf w}}{\min\,} \sum_i \log(1+\exp(-y_i {\bf x}_i^\top {\bf w})) + \lambda \|{\bf w}\|_1$

where $X=[{\bf x}_1,...,{\bf x}_n]$ is the training data matrix with samples $\{{\bf x}_i\}_{i\in [n]}$ in its columns and $y_i \in \{\pm 1\}$ denote the corresponding labels.

The regularization strength is controlled by the regularization parameter $\lambda\geq 0$ ; the larger $\lambda$ the more robust the model becomes to overfitting. $\lambda$ can be specified by the user through the regularizer input parameter.

Snap ML implements stochastic coordinate descent [SCD] and stochastic dual coordinate ascent [SDCA] as an algorithm to fit the model parameters ${\bf w}$ . In order to support GPU acceleration Snap ML implements a parallel asynchronous version of these solvers especially designed to leverage the massive parallelism of moderne GPUs [TPASCD].

The model can be trained using the fit method which takes the training data and the labels $X,\bf y$ as input and stores the coefficients of the learnt model ${\bf w}$ in its coef_ attribute. This model can then be used to make predictions by calling the predict method on unlabelled data. The regularization type can be specified at initialization using the penalty argument.

>>> from pai4sk import LogisticRegression
>>> import numpy as np

>>> lr = LogisticRegression(max_iter = 100, regularizer = 0.01, penalty = 'l2')
>>> X_train = np.array([[0, 0], [1, 1], [2, 2]])
>>> y_train = np.array([-1,-1,1])
>>> lr.fit (X_train, y_train)
>>> lr.coef_
[0.145, 0.145]
>>> X_test = np.array([[3,3],[-2,1]])
>>> lr.predict(X_test)
[1,-1]
>>> lr.predict_proba(X_test)
[[0.295, 0.705]
 [0.536, 0.464]]

A full example of training a LogisticRegression model in Snap ML can be found in the PowerAI distribution under /opt/DL/snap-ml-local/examples. For more details about the API we refer to the snap-ml API.

Decision Tree¶

Snap ML offers a DecisionTreeClassifier for training a binary decision tree. The training routine is single-threaded and executed on the CPU. Check the snap-ml API for details about the available options or check out the Tutorials for an application example.

Random Forest¶

With the RandomForestClassifier you can train an ensemble of trees. This is usually preferred over a single decision tree as it improves the generalization performance of the model. Snap ML implements a multi-threaded training routine that is executed on the CPU. Check the snap-ml API for details about the available options or check out the Tutorials for an application example.

References¶

[SCD]:	Y. Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 2012.
[SDCA]:	Shai Shalev-Shwartz and Tong Zhang. Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization. Journal of Machine Learning Research, 2013.
[TPASCD]:	Thomas Parnell, Celestine Dünner, Kubilay Atasu, Manolis Sifalakis and Haris Pozidis. Tera-Scale Coordinate Descent on GPUs. Journal on Future Generation Computer Systems, 2018.