Questions tagged [missing-data]

Missing data is a problem that arises in data science when some data contained in rows or columns may be missing or unavailable for some samples in a dataset.

Missing data is a problem that arises in data science when some data contained in rows or columns may be missing or unavailable for some samples in a dataset. This can occur from non-response, input errors, or lack of information. Remedies for missing data include dropping them (ie using df.dropna() in pandas) or some form of imputation. Popular imputation methods include mean imputation.

215 questions
15
votes
4 answers

How to use SimpleImputer Class to replace missing values with mean values using Python?

This is my code import numpy as np import matplotlib.pyplot as plt import pandas as pd #Importing Dataset dataset = pd.read_csv('C:/Users/Rupali Singh/Desktop/ML A-Z/Machine Learning A-Z Template Folder/Part 1 - Data…
Rupali Singh
  • 195
  • 1
  • 2
  • 8
14
votes
2 answers

What to do when testing data has less features than training data?

Let's say we are predicting the sales of a shop and my training data has two sets of features: One about the store sales with the dates (the field "Store" is not unique) One about the store types (the field "Store" is unique here) So the matrix…
11
votes
4 answers

How to impute Missing values not the usual way?

I have a dataset of 4712 records working on binary classification. Label 1 is 33% and Label 0 is 67%. I can't drop records because my sample is already small. Because there are few columns which has around 250-350 missing records. How do I know…
10
votes
1 answer

How do GBM algorithms handle missing data?

How do algorithms GBM algorithms, such as XGBoost or LightGBM handle NaN values? I know that they learn how to replace NaN values with other values but my question is: How do they do it exactly?
user10296606
  • 1,906
  • 6
  • 18
  • 33
10
votes
1 answer

Naive Bayes Should generate prediction given missing features (scikit learn)

Seeing that Naive Bayes uses probability to make a prediction, and treats features as being conditionally independent of each other, then it makes sense that the model can still make a prediction given that there are some features missing in the…
gbhrea
  • 307
  • 4
  • 10
8
votes
5 answers

Filling missing data with other than mean values

What are all the options available for filling in missing data? One obvious choice is the mean, but if the percentage of missing data is large, it will decrease the accuracy. So how do we deal with missing values if they are are lot of them?
mach
  • 367
  • 1
  • 4
  • 9
8
votes
2 answers

Fill missing values AND normalise

I have two columns of training data for a neural net which are missing values. (There are many other columns which aren't missing values.) For example Height | Weight 180 | 70 175 | N/A N/A | N/A I want to fill missing values and…
joel
  • 180
  • 1
  • 5
7
votes
5 answers

How to handle missing value if imputation doesnt make sense

I have column/feature in my dataset showing years a person has been married "years_married". Since not every person is married there are NaN fields. It does not make sense to fillna(0) "years_married" since 0 would mean the person just married.A…
methus
  • 131
  • 6
6
votes
2 answers

When to use missing data imputation in the data analysis problem?

I want to run statistical analysis of a dataset and build a logistic regression model and multinominal linear model by R according to the research question. But I was wondering which step should I use the missing value imputation to complete the…
Eileen
  • 61
  • 1
6
votes
2 answers

How to deal with missing data for Bernoulli Naive Bayes?

I am dealing with a dataset of categorical data that looks like this: content_1 content_2 content_4 content_5 content_6 0 NaN 0.0 0.0 0.0 NaN 1 NaN 0.0 0.0 0.0 …
6
votes
1 answer

Dealing with NaN (missing) values for Logistic Regression- Best practices?

I am working with a data-set of patient information and trying to calculate the Propensity Score from the data using MATLAB. After removing features with many missing values, I am still left with several missing (NaN) values. I get errors due to…
6
votes
4 answers

Imputation of missing values based on target variable

I want to impute missing values in German Credit Risk dataset. df['Saving accounts'].value_counts(dropna=False) output: little 603 NaN 183 moderate 103 quite rich 63 rich 48 There is almost 20% of data missing,…
Ars ML
  • 81
  • 3
5
votes
3 answers

What predictive model to use to impute Gender?

My data looks like this: birth_date has 634,990 missing values gender has 328,849 missing values Both of these are a substantial amounts since I have 900k entries, so I can't discard empty rows. For birth_date someone recommended using Multivariate…
Bn.F76
  • 195
  • 2
  • 7
5
votes
4 answers

Missing Values in Data

I have experienced that most of the datasets contain missing values, which make our task bit challenging. Please let me know how to fill up those missing values in an efficient way? and is there any specific techniques to handle missing values?
5
votes
1 answer

I wrote a code in R language to download PDF files from a website automatically, but the code didn't find the PDF file links, although there are links

Download PDF files frome this website "https://register.awmf.org/de/start" but the code didn't find any PDF Link, although there are links to PDF files, but indirectly,I want to download all available PDF files and organize them into a folder. and…
Ward Khedr
  • 51
  • 1
1
2 3
14 15