Highest Voted 'oversampling' Questions - Data Science Stack Exchange

4

votes

3 answers

Timing of applying random oversampling on the dataset

I tried to learn classification using machine learning algorithms. I went through Breast Cancer - EDA, Balancing and ML the notebook. In this notebook Random Oversampling had been implemented. However, when the person did the oversampling he did it…

asked Sep 05 '22 at 04:19

Encipher

381
1
11

2

votes

1 answer

Techniques for solving the problem with an unbalanced data set

I am trying to solve a problem with an unbalanced data set. I have two classes, one is for patients with risk (1), the other for patients without risk (0). I have a larger number of patients without risk. For analysis, I used methods such…

python class-imbalance prediction oversampling

asked May 13 '24 at 06:12

Naty

21
2

2

votes

0 answers

Using SMOTE Train Model and Optimal Cutoff on Unbalanced Test Data

My original dataset has a binary dependent variable with 3% of the values being one. First, I split the original dataset into training and testing sets using an 80-20 split. Since it includes both numeric and binary independent variables, I am using…

machine-learning class-imbalance smote oversampling smotenc

asked May 12 '24 at 13:07

CraigS

21
2

1

vote

2 answers

How to properly use oversampling without inflating results?

I am using with a tiny private dataset (over 192 samples) with 4 classes. A preprocessing step is trivial in order to do any classification. Among feature selection and extraction techniques, i decided to apply oversampling (SMOTE). Here is what i…

python classification preprocessing smote oversampling

asked Apr 07 '21 at 19:24

heresthebuzz

395
3
11

1

vote

1 answer

Using a regression model, is it possible to precisely predict "outlier" results based on a highly imbalanced dataset?

Title. I have a dataset that's highly imbalanced, say the output variable I want to predict is restricted within the range from 0 to 1, but almost all of the datapoints sit around 0.7-0.9, while my prediction set is mostly values from 0 to 0.4. I…

machine-learning machine-learning-model data-cleaning class-imbalance oversampling

asked Jun 06 '24 at 10:25

Yuuya

51
5

1

vote

3 answers

unbalanced data on train set and test set

I already have 2 datasets. One to use for training and one for testing. Both datasets are unbalanced (with similar percentages), with around 90% of label 1 . Will it be useful to balance the data if the test set is very unbalanced anyway? Instances…

machine-learning training sentiment-analysis oversampling

asked Mar 08 '23 at 01:53

mikeman

19
2

1

vote

1 answer

Does synthetic data be over sampled as well?

I'm building a binary text classifier, the ratio between the positives and negatives is 1:100 (100 / 10000). By using back translation as an augmentation, I was able to get 400 more positives. Then I decided to do up sampling to balance the data. Do…

classification class-imbalance data-augmentation oversampling

asked Mar 30 '22 at 15:09

guestmember123456790

11
1

1

vote

0 answers

oversampling multivariate time series data

For some classification needs. I have multivariate time series data composed from 4 stelite images in form of (145521 pixels, 4 dates, 2 bands) I made a classification with tempCNN to classify the data into 5 classes. However there is a big gap…

python classification time-series multiclass-classification oversampling

asked Oct 08 '21 at 22:02

ala

11
2

0

votes

2 answers

How to increase the Accuracy after Oversampling?

The Accuracy before ovesampling : On Training : 98,54% On Testing : 98,21% The Accuracy after ovesampling : On Training : 77,92% On Testing : 90,44% What does mean this and how to increase the accuracy ? Edit: Classes before…

accuracy oversampling

asked Jun 20 '21 at 16:31

Mimi

65
8

0

votes

0 answers

How to use SMOTE to rebalance multiclass dataset when the target is one hot encoded with pd.get_dummies?

I'm using a multiclass dataset (cic-ids-2017), which is very imbalanced. I have already encoded the categorical feature (which is the target) using OneHotEncoder. I tried to use SMOTE oversampling method to balance the data with pipeline: X =…

class-imbalance smote one-hot-encoding oversampling

asked Jun 02 '21 at 17:17

Mimi

65
8

0

votes

2 answers

Is it good practice to use SMOTE when you have a data set that has imbalanced classes when using BERT model for text classification?

I had a question related to SMOTE. If you have a data set that is imbalanced, is it correct to use SMOTE when you are using BERT? I believe I read somewhere that you do not need to do this since BERT take this into account, but I'm unable to find…

bert smote oversampling

asked Mar 25 '21 at 16:32

QMan5

133
5

0

votes

0 answers

Question on Optimized Threshold in Predictive Modeling

I'm trying to build a predictive model, but I haven't found a method that consistently delivers high performance. Is it acceptable to use an # Optimize classification threshold 0.996 ?

machine-learning predictive-modeling class-imbalance optimization oversampling

asked Feb 23 '25 at 17:42

waleed almutairi

11
1

0

votes

2 answers

Imbalanced class in my dataset

I’m working with an imbalanced dataset to predict strokes, where the positive class (stroke occurrence) is significantly underrepresented. Initially, I used logistic regression, but due to the class imbalance, I switched to a Random Forest model.…

predictive-modeling random-forest class-imbalance confusion-matrix oversampling

asked Oct 29 '24 at 12:07

Akingba Gladys

107
2

0

votes

0 answers

How to handle imbalanced edge weights in a graph for node embedding and edge weight prediction?

I have an undirected weighted graph where the edge weights represent probabilities. The majority of the edge weights are 1 (which are 7 times more frequent than the second major group of weights). I'm using this graph to learn node embeddings for an…

deep-learning class-imbalance overfitting graph-neural-network oversampling

asked Oct 15 '24 at 13:25

ToTheMoon

1

0

votes

1 answer

SVC labels entire sample majority class, even after using ADASYN

I have an imbalanced sample (850 in group X vs 100 in group Y). I am trying to predict group membership using support vector classifcation. I am using 'Adaptive Synthetic' (ADASYN) to oversample the minority class. Nevertheless, the best model just…

classification class-imbalance supervised-learning imbalanced-learn oversampling

asked Aug 20 '24 at 16:36

Vincent

103
4

Questions tagged [oversampling]