Fitting an Elastic Net with a precomputed Gram Matrix and Weighted Samples (original) (raw)
Note
Go to the endto download the full example code. or to run this example in your browser via JupyterLite or Binder
The following example shows how to precompute the gram matrix while using weighted samples with an ElasticNet.
If weighted samples are used, the design matrix must be centered and then rescaled by the square root of the weight vector before the gram matrix is computed.
Note
sample_weight
vector is also rescaled to sum to n_samples
, see the
documentation for the sample_weight
parameter tofit.
Authors: The scikit-learn developers
SPDX-License-Identifier: BSD-3-Clause
Let’s start by loading the dataset and creating some sample weights.
import numpy as np
from sklearn.datasets import make_regression
rng = np.random.RandomState(0)
n_samples = int(1e5) X, y = make_regression(n_samples=n_samples, noise=0.5, random_state=rng)
sample_weight = rng.lognormal(size=n_samples)
normalize the sample weights
normalized_weights = sample_weight * (n_samples / (sample_weight.sum()))
To fit the elastic net using the precompute
option together with the sample weights, we must first center the design matrix, and rescale it by the normalized weights prior to computing the gram matrix.
X_offset = np.average(X, axis=0, weights=normalized_weights) X_centered = X - np.average(X, axis=0, weights=normalized_weights) X_scaled = X_centered * np.sqrt(normalized_weights)[:, np.newaxis] gram = np.dot(X_scaled.T, X_scaled)
We can now proceed with fitting. We must passed the centered design matrix tofit
otherwise the elastic net estimator will detect that it is uncentered and discard the gram matrix we passed. However, if we pass the scaled design matrix, the preprocessing code will incorrectly rescale it a second time.
from sklearn.linear_model import ElasticNet
lm = ElasticNet(alpha=0.01, precompute=gram) lm.fit(X_centered, y, sample_weight=normalized_weights)
ElasticNet(alpha=0.01, precompute=array([[ 9.98809919e+04, -4.48938813e+02, -1.03237920e+03, ..., -2.25349312e+02, -3.53959628e+02, -1.67451144e+02], [-4.48938813e+02, 1.00768662e+05, 1.19112072e+02, ..., -1.07963978e+03, 7.47987268e+01, -5.76195467e+02], [-1.03237920e+03, 1.19112072e+02, 1.00393284e+05, ..., -3.07582983e+02, 6.66670169e+02, 2.65799352e+02], ..., [-2.25349312e+02, -1.07963978e+03, -3.07582983e+02, ..., 9.99891212e+04, -4.58195950e+02, -1.58667835e+02], [-3.53959628e+02, 7.47987268e+01, 6.66670169e+02, ..., -4.58195950e+02, 9.98350372e+04, 5.60836363e+02], [-1.67451144e+02, -5.76195467e+02, 2.65799352e+02, ..., -1.58667835e+02, 5.60836363e+02, 1.00911944e+05]]))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Total running time of the script: (0 minutes 0.776 seconds)
Related examples