11

How to merge duplicate column and sum their value?

What I have

A   30
A   40
B   50

What I need

A   70
B   50

DF for this example

d = {'address': ["A", "A", "B"], 'balances': [30, 40, 50]}
df = pd.DataFrame(data=d)
df

2 Answers2

16

You may use

df2 = df.groupby(['address']).sum()

or

df2 = df.groupby(['address']).agg('sum')

If there are columns other than balances that you want to peak only the first or max value, or do mean instead of sum, you can go as follows:

d = {'address': ["A", "A", "B"], 'balances': [30, 40, 50], 'sessions': [2, 3, 4]} 
df = pd.DataFrame(data=d) 
df2 = df.groupby(['address']).agg({'balances': 'sum', 'sessions': 'mean'})

That outputs

         balances   sessions
address       
A              70       2.5  
B              50       4.0

You may add as_index = False to groupby arguments to have:

  address  balances  sessions
0       A        70       2.5
1       B        50       4.0
Esmailian
  • 9,553
  • 2
  • 34
  • 49
7

In another case when you have a dataset with several duplicated columns and you wouldn't want to select them separately use:

df.groupby(by=df.columns, axis=1).sum()
Denisa
  • 86
  • 1
  • 2