Remove columns with a certain number of consecutive zeros

Question

I am using pandas and am dealing with time series of sales.

What I would like to do is to remove the columns where a certain number of consecutive zeros appear, since forecasting for sparse series or repeated zero values tend to be unreliable. My dataframe has a bit more than 4,400 time series for a time span of 5 years.

I have tried applying this strategy from a previous SO question, but I cannot drop the columns fulfilling the condition.

Below is an example where the threshold would be 12. Any column having 12 or more consecutive zeros would be discarded.

Input:

      A   B   C
0     1   1   8
1     2   0  14
2     3   0  20
3     0   0  15
4     0   0  23
5     0   0  25
6     0   0  22
7     0   0  18
8     0   0  16
9     0   0  14
10    0   0   8
11    1   0  10
12   18   0   7
13   34   0   4
14  110   1  14
15   11  30   2
16    0  24   8
17    0  11   7
18    1  22  11
19    0  33   3
20    0  90  12
21   12  32  19
22   11  90  17
23   77  13   2

Desired output:

      A   C
0     1   8
1     2  14
2     3  20
3     0  15
4     0  23
5     0  25
6     0  22
7     0  18
8     0  16
9     0  14
10    0   8
11    1  10
12   18   7
13   34   4
14  110  14
15   11   2
16    0   8
17    0   7
18    1  11
19    0   3
20    0  12
21   12  19
22   11  17
23   77   2

Thanks.

score 0 · Answer 1 · edited Jun 08 '20 at 13:53

0

def max0(sr):
    return (sr != 0).cumsum().value_counts().max() - (0 if (sr != 0).cumsum().value_counts().idxmax()==0 else 1)

max0(pd.Series([0,0,0,0]))

Output:
> 4

max0(pd.Series([1,0,0,0,0,2,3]))
Output:
> 4

max0(pd.Series([0]))
Output:

> 1

max0(pd.Series([1]))
Output:

> 0

edited Jun 08 '20 at 13:53

fuwiak

1,373
8
14
26

answered Jun 08 '20 at 09:10

RELW

1

Quang Hoang · Accepted Answer · 2021-11-29T18:04:28.863

You can try rolling().sum:

thresh = 12
to_drop = df.eq(0).rolling(thresh).sum().eq(thresh).any()
df.loc[:, ~to_drop]

Output:

      A   C
0     1   8
1     2  14
2     3  20
3     0  15
4     0  23
5     0  25
6     0  22
7     0  18
8     0  16
9     0  14
10    0   8
11    1  10
12   18   7
13   34   4
14  110  14
15   11   2
16    0   8
17    0   7
18    1  11
19    0   3
20    0  12
21   12  19
22   11  17
23   77   2

Multivac · Answer 3 · 2021-08-09T20:39:39.903

An alternative to archived what you are looking for is:

threshold = 12
drops = [l[0] for l in list(filter(lambda x: x[1] > threshold,[(col, (df.groupby((df[col] != 0).cumsum()).cumcount()).max()) 
 for col in df.columns]))]
df.drop(drops, axis = 1)

Outputs:

qmeeus · Answer 4 · 2019-04-19T14:35:23.730

Short answer:

# Max number of zeros in a row
threshold = 12

# 1. transform the column to boolean is_zero
# 2. calculate the cumulative sum to get the number of cumulative 0
# 3. Check how much of each count you get and remove 0 counts
# 4. Get the maximum number of cumulative zeros
# 6. Drop the columns with more than your threshold

to_remove = []
for col in df.columns:
    count_zeros = (df[col] == 0).cumsum().value_counts().drop(0)
    if len(count_zeros) and count_zeros.idxmax() > threshold:
        to_remove.append(col)

df = df.drop(to_remove, axis=1)

You're welcome

Remove columns with a certain number of consecutive zeros

4 Answers4