Pandas merge column duplicate and sum value

Question

How to merge duplicate column and sum their value?

What I have

A   30
A   40
B   50

What I need

A   70
B   50

DF for this example

d = {'address': ["A", "A", "B"], 'balances': [30, 40, 50]}
df = pd.DataFrame(data=d)
df

Esmailian · Answer 1 · 2019-03-10T07:42:11.443

You may use

df2 = df.groupby(['address']).sum()

or

df2 = df.groupby(['address']).agg('sum')

If there are columns other than balances that you want to peak only the first or max value, or do mean instead of sum, you can go as follows:

d = {'address': ["A", "A", "B"], 'balances': [30, 40, 50], 'sessions': [2, 3, 4]} 
df = pd.DataFrame(data=d) 
df2 = df.groupby(['address']).agg({'balances': 'sum', 'sessions': 'mean'})

That outputs

         balances   sessions
address       
A              70       2.5  
B              50       4.0

You may add as_index = False to groupby arguments to have:

  address  balances  sessions
0       A        70       2.5
1       B        50       4.0

score 7 · Accepted Answer · answered Feb 27 '20 at 12:53

7

In another case when you have a dataset with several duplicated columns and you wouldn't want to select them separately use:

df.groupby(by=df.columns, axis=1).sum()

answered Feb 27 '20 at 12:53

Denisa

86
1
2

Pandas merge column duplicate and sum value

2 Answers2

Linked