In pandas, can I avoid a loop when assigning value based on specific row values?

Question

I have three columns,['date'] which has the date, ['id'] which holds product id's and ['rating'] which holds product ratings for each product for each data, I want to create a dummy variable ['threshold'] which equals 1 when within the same value of ['id'] the value of rating went from anywhere above 5 to anywhere below 6. My code would use a for loop as follows:

df['threshold']=np.zeros(df.shape[0])
for i in range(df.shape[0]):
        if df.iloc[i]['id'] == df.iloc[i-1]['id'] and df.iloc[i-1]['rating']>5 and df.iloc[i]['rating']<6:
            df.iloc[i]['threshold']=1

Is there a way to perform this without using a for loop?

Please include a small sample of your data along with your desired results. Take a look at [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Shubham Sharma, Jun 21 '20 at 07:27

jezrael · Accepted Answer · 2020-06-21T07:38:37.383

1

Use Series.shift and compare with Series.eq for equal and convert output mask to integers 0,1 by Series.view:

df['threshold']= (df['id'].eq(df['id'].shift()) & 
                  df['rating'].shift().gt(5) & 
                  df['rating'].lt(6)).view('i1')

edited Jun 21 '20 at 07:38

answered Jun 21 '20 at 07:24

jezrael

822,522
95
1,334
1,252

In pandas, can I avoid a loop when assigning value based on specific row values?

1 Answers1