Questions tagged [feature-scaling]

Feature scaling is a data pre-processing step where the range of variable values is standardized. Standardization of datasets is a common requirement for many machine learning algorithms. Popular feature scaling types include scaling the data to have zero mean and unit variance, and scaling the data between a given minimum and maximum value.

264 questions
36
votes
4 answers

What is a good way to transform Cyclic Ordinal attributes?

I am having 'hour' field as my attribute, but it takes a cyclic values. How could I transform the feature to preserve the information like '23' and '0' hour are close not far. One way I could think is to do transformation: min(h, 23-h) Input: [0 1…
33
votes
1 answer

Ways to deal with longitude/latitude feature

I am working on a fictional dataset with 25 features. Two of the features are latitude and longitude of a place and others are pH values, elevation, windSpeed etc with varying ranges. I can perform normalization on the other features but how do I…
31
votes
1 answer

Should one hot vectors be scaled with numerical attributes

In the case of having a combination of categorical and numerical Attributes, I usually convert the categorical attributes to one hot vectors. My question is do I leave those vectors as is and scale the numerical attributes through…
30
votes
3 answers

Why do we convert skewed data into a normal distribution

I was going through a solution of the Housing prices competition on Kaggle (Human Analog's Kernel on House Prices: Advance Regression Techniques) and came across this part: # Transform the skewed numeric features by taking log(feature + 1). # This…
25
votes
2 answers

Feature Transformation on Input data

I was reading about the solution to this OTTO Kaggle challenge and the first place solution seems to use several transforms for the input data X, for example Log(X+1), sqrt( X + 3/8), etc. Is there a general guideline on when to apply which kind…
18
votes
3 answers

When should I use StandardScaler and when MinMaxScaler?

I have a feature vector with One-Hot-Encoded features and with continous features. How can I decide now, which data I shall scale with StandardScaler and which data scale with MinMaxScaler? I think I do not have to scale the one-hot-encoded anyway…
17
votes
4 answers

How to scale an array of signed integers to range from 0 to 1?

I'm using Brain to train a neural network on a feature set that includes both positive and negative values. But Brain requires input values between 0 and 1. What's the best way to normalize my data?
16
votes
3 answers

Zero Mean and Unit Variance

I'm studying Data Scaling, and in particular the Standardization method. I've understood the math behind it, but it's not clear to me why it's important to give the features zero mean and unit variance. Can you explain me ?
Qwerto
  • 705
  • 1
  • 8
  • 15
11
votes
6 answers

When should I NOT scale features

Feature scaling can be crucially necessary when using distance-, variance- or gradient-based methods (KNN, PCA, neural networks...), because depending on the case, it can improve the quality of results or the computational effort. In some cases…
Romain Reboulleau
  • 1,387
  • 9
  • 26
11
votes
3 answers

Data scaling before or after PCA

I have seen senior data scientists doing data scaling either before or after applying PCA. What is more right to do and why?
Outcast
  • 1,117
  • 3
  • 14
  • 29
11
votes
2 answers

Linear Regression and scaling of data

The following plot shows coefficients obtained with linear regression (with mpg as the target variable and all others as predictors). For mtcars dataset (here and here) both with and without scaling the data: How do I interpret these results? The…
rnso
  • 1,608
  • 3
  • 19
  • 35
11
votes
2 answers

Consequence of Feature Scaling

I am currently using SVM and scaling my training features to the range of [0,1]. I first fit/transform my training set and then apply the same transformation to my testing set. For example: ### Configure transformation and apply to training set …
mike1886
  • 933
  • 9
  • 17
10
votes
1 answer

Should I rescale tfidf features?

I have a dataset which contains both text and numeric features. I have encoded the text ones using the TfidfVectorizer from sklearn. I would now like to apply logistic regression to the resulting dataframe. My issue is that the numeric features…
ignoring_gravity
  • 793
  • 4
  • 15
8
votes
1 answer

How to handle preprocessing (StandardScaler, LabelEncoder) when using data generator to train?

So, I have a dataset that is too big to load into memory all at once. Therefore I want to use a generator to load batches of data to train on. In this scenario, how do I go about performing scaling of the features using LabelEncoder +…
8
votes
6 answers

Do Clustering algorithms need feature scaling in the pre-processing stage?

Is feature scaling useful for clustering algorithms? What type of features, I mean numeric, categorical etc., are most efficient for clustering?
1
2 3
17 18