Questions tagged [lasso]

Least Absolute Shrinkage and Selection Operator (LASSO) regression, is a regularization technique used in regression cases where the model overfits or there is high multi-collinearity.

Lease Absolute Shrinkage and Selection Operator (LASSO) regression, is a regularization technique used in regression cases where the model overfits or there is high multi-collinearity. It has one tuning parameter, $\lambda$, and as this value in increased the estimates are shrunk closer and closer to zero. It differs from Ridge Regression in that values can be shrunk to zero which can make this Lasso Regression useful for feature selection.

It is defined by:

$$SSE_{L1 norm} = \sum_{i=1}^{n}(y_i-\hat{y_i})^2 + \lambda \sum_{j=1}^{P} \lvert{\beta_j^2}\rvert$$

Where the goal is to reduce model complexity and by adding a penalty term to the Sum of Squared Errors (SSE).

46 questions
4
votes
1 answer

For a square matrix of data, I achieve $R^2=1$ for Linear Regression and $R^2=0$ for Lasso. What's the intuition behind?

For a square matrix of random data, N columns and N rows. I am fitting two models, linear regression and Lasso. For the linear regression, I achieve a perfect score in train set, while in the Lasso I achieve a score of 0. import pandas as pd import…
Carlos Mougan
  • 6,430
  • 2
  • 20
  • 51
4
votes
2 answers

LASSO remaining features for different penalisation

I am using the sklearn LASSOCV function and I am changing the penalisation parameter in order to adjust the number of features killed off. For example for $\alpha = 0.01$ I have 55 features remaining and for $\alpha=0.5$ I have 6 remaining features.…
prax1telis
  • 141
  • 1
4
votes
2 answers

Why does Lasso behave "erratically" when the number of features is greater than the number of training instances?

From the book "Hands-on Machine Learning with Scikit-Learn and TensorFlow 2nd edition" chapter 4: In general, Elastic Net is preferred over Lasso since Lasso may behave erratically when the number of features is greater than the number of …
Moaz Ashraf
  • 141
  • 3
3
votes
1 answer

Difference between PCA and regularisation

Currently, I am confusing about PCA and regularisation. I wonder what is the difference between PCA and regularisation: particularly lasso (L1) regression? Seems both of them can do the feature selection. I have to admit, I am not quiet familiar…
Crazy
  • 133
  • 3
3
votes
1 answer

Normalisation results in R^2 score of 0 - Lasso regression

I am running a regression analysis on a 7000 row dataset with a train/test split of 70%/30%. I am using one variable X to predict a variable Y. X ranges between 300 and 810 (mean 712). Y is an integer (number of occurrences) ranging between 0 and…
atom
  • 31
  • 1
3
votes
1 answer

Need advice regarding cross-validiation to obtain optimal lambda in Lasso

I am comparatively new to machine learning and any suggestions and coding corrections will be a great help. I am using Lasso for feature selection and want to select the lambda that produces the lowest error. the data set I am using has 500 samples…
2
votes
1 answer

Interpreting machine learning coefficients

My dog show predictive tool is having some trouble with its neural net. Broadly, I start with a couple of factors--age, weight, height, breed (which is a set of dummy variables), a subjective cuteness score--and predict whether the animal will win…
2
votes
3 answers

how Lasso regression helps to shrinks the coefficient to zero and why ridge regression dose not shrink the coefficient to zero?

How does Lasso regression help with feature selection of model by making the coefficient shrink to zero? I could see few below with below diagram. Can any please explain in simple terms how to correlate below diagram with: How Lasso shrinks the…
star
  • 1,521
  • 7
  • 20
  • 31
2
votes
2 answers

How do standardization and normalization impact the coefficients of linear models?

One benefit of creating a linear model is that you can look at the coefficients the model learns and interpret them. For example, you can see which features have the most predictive power and which do not. How, if at all, does feature…
2
votes
1 answer

When should we start using stacking of models?

I am solving a Kaggle contest and my single model has reached score of 0.121, I'd like to know when to start using ensembling/stacking to improve the score. I used lasso and xgboost and there obviously must be variance associated with those two…
2
votes
0 answers

Why do you need to use group lasso with categorical variables?

From what I've read you should you use group lasso to either discard the dummy encoded variables (of the category) or use all of them. If you use normal lasso then some of the variables in the group can be discarded (set to zero) and some might not,…
Ferus
  • 121
  • 1
2
votes
0 answers

Reverse engineering what stocks are in a dummy ETF using regression (lasso, ridge, etc) in Python

I'm trying to reverse engineer what stocks are in a ETF using python. In my code, I create a fake ETF that is equal weighted 20 random stocks. I then try to reverse engineer whats in my ETF using price data for a universe of 200+ stocks. No matter…
Mac
  • 29
  • 1
2
votes
0 answers

Can I rescale TF matrix or TF-IDF matrix using StandardScaler prior to Logisitc Lasso regression?

I am trying to use Logistic Lasso to classify documents as 1 or 0. I've tried using both the TF matrix and TF-IDF matrix representations of the documents as my predictors. I've found that if I use the StandardScaler function in python (standardizing…
1
vote
1 answer

Lasso regression not getting better without random features

First of all, I'm new to lasso regression, so sorry if this feels stupid. I'm trying to build a regression model and wanted to use lasso regression for feature selection as I have quite a few features to start with. I started by standardizing all…
Onur Ece
  • 11
  • 1
1
vote
1 answer

Lasso Regression for Feature Importance saying almost every feature is unimportant?

I have a metric (RevenueSoFar) that is a great predictor of my target FinalRevenue as you'd expect - it is a metric where we tend to get 90-95% of revenue so far on day 1 and then it can increase over the next 6 days. Therefore i'm also using…
1
2 3 4