31

How can I get the number of missing value in each row in Pandas dataframe. I would like to split dataframe to different dataframes which have same number of missing values in each row.

Any suggestion?

Kaggle
  • 2,977
  • 5
  • 15
  • 8

8 Answers8

50

When using pandas, try to avoid performing operations in a loop, including apply, map, applymap etc. That's slow!

A DataFrame object has two axes: “axis 0” and “axis 1”. “axis 0” represents rows and “axis 1” represents columns.

If you want to count the missing values in each column, try:

df.isnull().sum() as default or df.isnull().sum(axis=0)

On the other hand, you can count in each row (which is your question) by:

df.isnull().sum(axis=1)

It's roughly 10 times faster than Jan van der Vegt's solution(BTW he counts valid values, rather than missing values):

In [18]: %timeit -n 1000 df.apply(lambda x: x.count(), axis=1)
1000 loops, best of 3: 3.31 ms per loop

In [19]: %timeit -n 1000 df.isnull().sum(axis=1)
1000 loops, best of 3: 329 µs per loop
Alex
  • 3
  • 2
Icyblade
  • 4,376
  • 1
  • 25
  • 34
26

You can apply a count over the rows like this:

test_df.apply(lambda x: x.count(), axis=1)

test_df:

    A   B   C
0:  1   1   3
1:  2   nan nan
2:  nan nan nan

output:

0:  3
1:  1
2:  0

You can add the result as a column like this:

test_df['full_count'] = test_df.apply(lambda x: x.count(), axis=1)

Result:

    A   B   C   full_count
0:  1   1   3   3
1:  2   nan nan 1
2:  nan nan nan 0
Jan van der Vegt
  • 9,448
  • 37
  • 52
6

The simplist way:

df.isnull().sum(axis=1)
Yuan JI
  • 161
  • 1
  • 3
4

Or, you could simply make use of the info method for dataframe objects:

df.info()

which provides counts of non-null values for each column.

Chris Ivan
  • 171
  • 4
4

null values along the column,

df.isnull().sum(axis=0)

blank values along the column,

c = (df == '').sum(axis=0)

null values along the row,

df.isnull().sum(axis=1)

blank values along the row,

c = (df == '').sum(axis=1)
1
>>> df = pd.DataFrame([[1, 2, np.nan],
...                    [np.nan, 3, 4],
...                    [1, 2,      3]])

>>> df
    0  1   2
0   1  2 NaN
1 NaN  3   4
2   1  2   3

>>> df.count(axis=1)
0    2
1    2
2    3
dtype: int64
K3---rnc
  • 3,582
  • 1
  • 14
  • 12
0

This snippet will return integer value of total number of columns with missing value:

(df.isnull().sum() > 0).astype(np.int64).sum()
-1

If you want count of missing values:

np.logical_not(df.isnull()).sum()
Itachi
  • 251
  • 2
  • 8