class pai4sk.linear_model.Ridge(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, solver='auto', random_state=None, dual=False, verbose=0, use_gpu=True, device_ids=[], return_training_history=None, privacy=False, eta=0.3, batch_size=100, privacy_epsilon=10, grad_clip=1, num_threads=1)

Linear least squares with l2 regularization.

Minimizes the objective function:

||y - Xw||^2_2 + alpha * ||w||^2_2

This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape [n_samples, n_targets]).

Read more in the User Guide.

For SnapML solver this supports both local and distributed(MPI) method of execution.

  • alpha ({float, array-like}, shape (n_targets)) – Regularization strength; must be a positive float. Regularization improves the conditioning of the problem and reduces the variance of the estimates. Larger values specify stronger regularization. Alpha corresponds to C^-1 in other linear models such as LogisticRegression or LinearSVC. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number.
  • fit_intercept (boolean) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
  • normalize (boolean, optional, default False) – This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use pai4sk.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.
  • copy_X (boolean, optional, default True) – If True, X will be copied; else, it may be overwritten.
  • max_iter (int, optional) – Maximum number of iterations for conjugate gradient solver. For ‘sparse_cg’ and ‘lsqr’ solvers, the default value is determined by scipy.sparse.linalg. For ‘sag’ solver, the default value is 1000.
  • tol (float) – Precision of the solution.
  • regularizer (float, default : 1.0) – Regularization strength. It must be a positive float. Larger regularization values imply stronger regularization.
  • use_gpu (bool, default : True) – Flag for indicating the hardware platform used for training. If True, the training is performed using the GPU. If False, the training is performed using the CPU. The value of this parameter is subjected to changed based on the training data unless set explicitly. Applicable only for snapml solver
  • device_ids (array-like of int, default : []) – If use_gpu is True, it indicates the IDs of the GPUs used for training. For single GPU training, set device_ids to the GPU ID to be used for training, e.g., [0]. For multi-GPU training, set device_ids to a list of GPU IDs to be used for training, e.g., [0, 1]. Applicable only for snapml solver
  • num_threads (int, default : 1) – The number of threads used for running the training. The value of this parameter should be a multiple of 32 if the training is performed on GPU (use_gpu=True) (default value for GPU is 256). Applicable only for snapml solver
  • return_training_history (str or None, default : None) – How much information about the training should be collected and returned by the fit function. By default no information is returned (None), but this parameter can be set to “summary”, to obtain summary statistics at the end of training, or “full” to obtain a complete set of statistics for the entire training procedure. Note, enabling either option will result in slower training. Applicable only for snapml solver
  • solver ({'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'snapml'}) –

    Solver to use in the computational routines:

    • ’auto’ chooses the solver automatically based on the type of data.
    • ’svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. More stable for singular matrices than ‘cholesky’.
    • ’cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution.
    • ’sparse_cg’ uses the conjugate gradient solver as found in As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).
    • ’lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.
    • ’sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from pai4sk.preprocessing.

    All last five solvers support both dense and sparse data. However, only ‘sag’ and ‘saga’ supports sparse input when fit_intercept is True.

    New in version 0.17: Stochastic Average Gradient descent solver.

    New in version 0.19: SAGA solver.

  • random_state (int, RandomState instance or None, optional, default None) –

    The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when solver == ‘sag’.

    New in version 0.17: random_state to support Stochastic Average Gradient.

  • privacy (bool, default : False) – Train the model using a differentially private algorithm.
  • eta (float, default : 0.3) – Learning rate for the differentially private training algorithm.
  • batch_size (int, default : 100) – Mini-batch size for the differentially private training algorithm.
  • privacy_epsilon (float, default : 10.0) – Target privacy gaurantee. Learned model will be (privacy_epsilon, 0.01)-private.
  • grad_clip (float, default: 1.0) – Gradient clipping parameter for the differentially private training algorithm
  • coef (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s).
  • intercept (float | array, shape = (n_targets,)) – Independent term in decision function. Set to 0.0 if fit_intercept = False.
  • n_iter (array or None, shape (n_targets,)) – Actual number of iterations for each target. Available only for sag and lsqr solvers. Other solvers will return None.
  • training_history (dict) –

    It returns a dictionary with the following keys : ‘epochs’, ‘t_elap_sec’, ‘train_obj’. If ‘return_training_history’ is set to “summary”, ‘epochs’ contains the total number of epochs performed, ‘t_elap_sec’ contains the total time for completing all of those epochs. If ‘return_training_history’ is set to “full”, ‘epochs’ indicates the number of epochs that have elapsed so far, and ‘t_elap_sec’ contains the time to do those epochs. ‘train_obj’ is the training loss. Applicable only for snapml solver.

    New in version 0.17.

See also

Ridge classifier
Ridge regression with built-in cross validation
Kernel ridge regression combines ridge regression with the kernel trick


>>> from pai4sk.linear_model import Ridge
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = Ridge(alpha=1.0)
>>>, y) # doctest: +NORMALIZE_WHITESPACE
Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)
fit(X, y, sample_weight=None)

Fit Ridge regression model

  • X ({array-like, sparse matrix}, shape = [n_samples, n_features]) – Training data For SnapML solver it also supports input of types SnapML data partition and DeviceNDArray.
  • y (array-like, shape = [n_samples] or [n_samples, n_targets]) – Target values
  • sample_weight (float or numpy array of shape [n_samples]) – Individual weights for each sample


Return type:

returns an instance of self.

predict(X, num_threads=0)

Class predictions The returned class estimates. Parameters ———- X : sparse matrix (csr_matrix) or dense matrix (ndarray)

Dataset used for predicting class estimates. For SnapML solver it also supports input of type SnapML data partition.
num_threads : int, default : 0
Number of threads used to run inference. By default inference runs with maximum number of available threads.
proba: array-like, shape = (n_samples,)
Returns the predicted class of the sample.