Pandas assigning with loc

Question

We have 2 datasets and the goal is to add columns from one dataset to another. Below is example of code:

import pandas as pd
d1 = {'city_id': [116,1,1,1,116,1,1,116,1], 'key': [14,14,22,21,22,14,13,80,99]}
d2={'key':[14,22,80],'population':[2000,7500,11000],'median_income':[30000,50000,44000]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
print(df1)
print()
print(df2)

   city_id  key
0      116   14
1        1   14
2        1   22
3        1   21
4      116   22
5        1   14
6        1   13
7      116   80
8        1   99


   key  population  median_income
0   14        2000          30000
1   22        7500          50000
2   80       11000          44000

On next step I do:

print(df1.loc[df1['city_id']==116].assign(
population=lambda x: x['key'].map(dict(zip(df2['key'],df2['population'])))
,median_income=lambda x:x['key'].map(dict(zip(df2['key'],df2['median_income'])))
))

   city_id  key  population  median_income
0      116   14        2000          30000
4      116   22        7500          50000
7      116   80       11000          44000

Than, when I try to assign it into original dataframe, I get an error:

df1.loc[df1['city_id']==116]=df1.loc[df1['city_id']==116].assign(
population=lambda x: x['key'].map(dict(zip(df2['key'],df2['population'])))
,median_income=lambda x:x['key'].map(dict(zip(df2['key'],df2['median_income'])))
)
ValueError: shape mismatch: value array of shape (3,4)  could not be broadcast to indexing result of shape (3,2)

While the expected result is:

   city_id  key  population  median_income
0      116   14        2000         30000
1      1     14         NaN          NaN
2      1     22         NaN          NaN
3      1     21         NaN          NaN
4      116   22        7500         50000
5      1     14         NaN          NaN
6      1     13         NaN          NaN
7      116   80       11000        44000
8      1     99         NaN          NaN

What can possibly solve the issue?

Note! We cannot use merge, because in reality there are 20+ different 'city_id' and it will create numerous suffixes like'population_x','population_y','population_z'...'median_income_x', 'median_income_y','median_income_z', which seems to be way less handy. The idea is to create function and using assing for each city_id.

Does [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) help? — Ynjxsjmh, Apr 28 '22 at 16:35
Seems helpful, merging will definitely solve the issue after fixing suffixes. — Nikita Tsekhanovich, Apr 28 '22 at 16:52
@NikitaTsekhanovich given your example - you don't need to worry about suffixes... as per my answer? — Jon Clements, Apr 28 '22 at 16:57

score 1 · Accepted Answer · answered Apr 28 '22 at 16:36

I think it is very unlikely this is the correct thing to do in this setting. You should almost surely use merge and melt as necessary. That being said, instead you can simply add two new, empty columns to the original dataset.

import pandas as pd
d1 = {'city_id': [116,1,1,1,116,1,1,116,1], 'key': [14,14,22,21,22,14,13,80,99]}
d2={'key':[14,22,80],'population':[2000,7500,11000],'median_income':[30000,50000,44000]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

import numpy as np 
df1[["population", "median_income"]] = np.nan

df1.loc[df1['city_id']==116]=df1.loc[df1['city_id']==116].assign(
population=lambda x: x['key'].map(dict(zip(df2['key'],df2['population'])))
,median_income=lambda x:x['key'].map(dict(zip(df2['key'],df2['median_income'])))
)

This works as you want it to.

score 1 · Answer 2 · answered Apr 28 '22 at 16:38

1

It looks like you're better off merging the two then blanking out the columns where the city_id isn't 116.

df3 = df1.merge(df2, on='key', how='left')
df3.loc[df3['city_id'] != 116, ['population', 'median_income']] = np.nan

answered Apr 28 '22 at 16:38

Jon Clements

138,671
33
247
280

Pandas assigning with loc

2 Answers2