We have 2 datasets and the goal is to add columns from one dataset to another. Below is example of code:
import pandas as pd
d1 = {'city_id': [116,1,1,1,116,1,1,116,1], 'key': [14,14,22,21,22,14,13,80,99]}
d2={'key':[14,22,80],'population':[2000,7500,11000],'median_income':[30000,50000,44000]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
print(df1)
print()
print(df2)
city_id key
0 116 14
1 1 14
2 1 22
3 1 21
4 116 22
5 1 14
6 1 13
7 116 80
8 1 99
key population median_income
0 14 2000 30000
1 22 7500 50000
2 80 11000 44000
On next step I do:
print(df1.loc[df1['city_id']==116].assign(
population=lambda x: x['key'].map(dict(zip(df2['key'],df2['population'])))
,median_income=lambda x:x['key'].map(dict(zip(df2['key'],df2['median_income'])))
))
city_id key population median_income
0 116 14 2000 30000
4 116 22 7500 50000
7 116 80 11000 44000
Than, when I try to assign it into original dataframe, I get an error:
df1.loc[df1['city_id']==116]=df1.loc[df1['city_id']==116].assign(
population=lambda x: x['key'].map(dict(zip(df2['key'],df2['population'])))
,median_income=lambda x:x['key'].map(dict(zip(df2['key'],df2['median_income'])))
)
ValueError: shape mismatch: value array of shape (3,4) could not be broadcast to indexing result of shape (3,2)
While the expected result is:
city_id key population median_income
0 116 14 2000 30000
1 1 14 NaN NaN
2 1 22 NaN NaN
3 1 21 NaN NaN
4 116 22 7500 50000
5 1 14 NaN NaN
6 1 13 NaN NaN
7 116 80 11000 44000
8 1 99 NaN NaN
What can possibly solve the issue?
Note! We cannot use merge, because in reality there are 20+ different 'city_id' and it will create numerous suffixes like'population_x','population_y','population_z'...'median_income_x', 'median_income_y','median_income_z', which seems to be way less handy. The idea is to create function and using assing for each city_id.