4

I get a csv that if you read it, looks like:

import pandas as pd
df = pd.DataFrame([['de,ch,fr', '1,2,3'],['fr,ch,dk', '3,4,5']], columns=['countries', 'numbers'], index=['abc', 'bcd'])

I want to make it look like this:

df = pd.DataFrame([[1,2,3,0], [0,4,3,5]], columns=['de_number', 'ch_number', 'fr_number', 'dk_number'], index=['abc', 'bcd'])

Meaning exploding the countries column and getting for every value in index the number for every country in a separate column. I have the list of all the countries for this dataframe beforehand (meaning that I knew beforehand that I'm going to have the values ['de', 'ch', 'fr', 'dk'])

Is there a nice clean way of doing it? Everything that comes into my mind is quite messy.

1 Answers1

5

First we use DataFrame.explode to unnest your lists to rows.

Then we use DataFrame.pivot_table to pivot your dataframe from rows to column to get your desired result:

dfn = df.assign(countries=df['countries'].str.split(',')).explode('countries')
dfn['numbers'] = df.assign(numbers=df['numbers'].str.split(',')).explode('numbers')['numbers']

dfn = (
    dfn.pivot_table(index=dfn.index, 
                    columns='countries', 
                    values='numbers', 
                    aggfunc=lambda x: x, 
                    fill_value=0)
       .rename_axis(None, axis='columns')
       .add_suffix('_number')
)

Output

    ch_number de_number dk_number fr_number
abc         2         1         0         3
bcd         4         0         5         3
Erfan
  • 241
  • 1
  • 3