6

Can the number of features used in a linear regression be regarded as a hyperparameter? Perhaps the choice of features?

Vykta Wakandigara
  • 213
  • 1
  • 2
  • 6

3 Answers3

9

I like the way Wikipedia generally defines it:

In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training.

On top of what Wikipedia says I would add:

Hyperparameter is a parameter that concerns the numerical optimization problem at hand. The hyperparameter won't appear in the machine learning model you build at the end. Simply put it is to control the process of defining your model. For example like in many machine learning algorithms we have learning rate in gradient descent (that need to be set before the learning process begins as Wikipedia defines it), that is a value that concerns how fast we want the gradient descent to take the next step during the optimization.

Similarly as in Linear Regression, hyperparameter is for instance the learning rate. If it is a regularized Regression like LASSO or Ridge, the regularization term is the hyperparameter as well.

Number of features: I would not regard "Number of features" as hyperparameter. You may ask yourself whether it is a parameter you can simply define during the model optimization? How you set the Number of features beforehand? To me "Number of features" is part of feature selection i.e. feature engineering that goes before you run your optimization! Think of image preprocessing before building a deep neural network. Whatever image preprocessing is done is never considered hyperparameter, it is rather a feature engineering step before feeding it to your model.

TwinPenguins
  • 4,429
  • 3
  • 22
  • 54
6

Hyper-parameters by definition are input parameters which are necessarily required by an algorithm to learn from data.

For standard linear regression i.e OLS, there is none. The number/ choice of features is not a hyperparameter, but can be viewed as a post processing or iterative tuning process.

On the other hand, Lasso takes care of number/choice of features in its formulation of the loss function itself, so only hyper-parameter for it would be the shrinkage factor i.e lambda

Mankind_2000
  • 840
  • 5
  • 10
1

The features from your data set in linear regression are called parameters. Hyperparameters are not from your data set. They are tuned from the model itself. For example, the level of splits in classification models.

For basic straight line linear regression, there are no hyperparameter.

Stephen Rauch
  • 1,831
  • 11
  • 23
  • 34
Anthony
  • 11
  • 1