Questions tagged [oversampling]
21 questions
4
votes
3 answers
Timing of applying random oversampling on the dataset
I tried to learn classification using machine learning algorithms. I went through Breast Cancer - EDA, Balancing and ML the notebook. In this notebook Random Oversampling had been implemented. However, when the person did the oversampling he did it…
Encipher
- 381
- 1
- 11
2
votes
1 answer
Techniques for solving the problem with an unbalanced data set
I am trying to solve a problem with an unbalanced data set. I have two classes, one is for patients with risk (1), the other for patients without risk (0). I have a larger number of patients without risk.
For analysis, I used methods such…
Naty
- 21
- 2
2
votes
0 answers
Using SMOTE Train Model and Optimal Cutoff on Unbalanced Test Data
My original dataset has a binary dependent variable with 3% of the values being one. First, I split the original dataset into training and testing sets using an 80-20 split. Since it includes both numeric and binary independent variables, I am using…
CraigS
- 21
- 2
1
vote
2 answers
How to properly use oversampling without inflating results?
I am using with a tiny private dataset (over 192 samples) with 4 classes. A preprocessing step is trivial in order to do any classification. Among feature selection and extraction techniques, i decided to apply oversampling (SMOTE). Here is what i…
heresthebuzz
- 395
- 3
- 11
1
vote
1 answer
Using a regression model, is it possible to precisely predict "outlier" results based on a highly imbalanced dataset?
Title.
I have a dataset that's highly imbalanced, say the output variable I want to predict is restricted within the range from 0 to 1, but almost all of the datapoints sit around 0.7-0.9, while my prediction set is mostly values from 0 to 0.4.
I…
Yuuya
- 51
- 5
1
vote
3 answers
unbalanced data on train set and test set
I already have 2 datasets. One to use for training and one for testing.
Both datasets are unbalanced (with similar percentages), with around 90% of label 1 .
Will it be useful to balance the data if the test set is very unbalanced anyway?
Instances…
mikeman
- 19
- 2
1
vote
1 answer
Does synthetic data be over sampled as well?
I'm building a binary text classifier, the ratio between the positives and negatives is 1:100 (100 / 10000).
By using back translation as an augmentation, I was able to get 400 more positives.
Then I decided to do up sampling to balance the data. Do…
guestmember123456790
- 11
- 1
1
vote
0 answers
oversampling multivariate time series data
For some classification needs. I have multivariate time series data composed from 4 stelite images in form of (145521 pixels, 4 dates, 2 bands) I made a classification with tempCNN to classify the data into 5 classes. However there is a big gap…
ala
- 11
- 2
0
votes
2 answers
How to increase the Accuracy after Oversampling?
The Accuracy before ovesampling :
On Training : 98,54%
On Testing : 98,21%
The Accuracy after ovesampling :
On Training : 77,92%
On Testing : 90,44%
What does mean this and how to increase the accuracy ?
Edit:
Classes before…
Mimi
- 65
- 8
0
votes
0 answers
How to use SMOTE to rebalance multiclass dataset when the target is one hot encoded with pd.get_dummies?
I'm using a multiclass dataset (cic-ids-2017), which is very imbalanced. I have already encoded the categorical feature (which is the target) using OneHotEncoder.
I tried to use SMOTE oversampling method to balance the data with pipeline:
X =…
Mimi
- 65
- 8
0
votes
2 answers
Is it good practice to use SMOTE when you have a data set that has imbalanced classes when using BERT model for text classification?
I had a question related to SMOTE. If you have a data set that is imbalanced, is it correct to use SMOTE when you are using BERT? I believe I read somewhere that you do not need to do this since BERT take this into account, but I'm unable to find…
QMan5
- 133
- 5
0
votes
0 answers
Question on Optimized Threshold in Predictive Modeling
I'm trying to build a predictive model, but I haven't found a method that consistently delivers high performance.
Is it acceptable to use an # Optimize classification threshold
0.996 ?
waleed almutairi
- 11
- 1
0
votes
2 answers
Imbalanced class in my dataset
I’m working with an imbalanced dataset to predict strokes, where the positive class (stroke occurrence) is significantly underrepresented. Initially, I used logistic regression, but due to the class imbalance, I switched to a Random Forest model.…
Akingba Gladys
- 107
- 2
0
votes
0 answers
How to handle imbalanced edge weights in a graph for node embedding and edge weight prediction?
I have an undirected weighted graph where the edge weights represent probabilities. The majority of the edge weights are 1 (which are 7 times more frequent than the second major group of weights). I'm using this graph to learn node embeddings for an…
0
votes
1 answer
SVC labels entire sample majority class, even after using ADASYN
I have an imbalanced sample (850 in group X vs 100 in group Y). I am trying to predict group membership using support vector classifcation. I am using 'Adaptive Synthetic' (ADASYN) to oversample the minority class. Nevertheless, the best model just…
Vincent
- 103
- 4