Assigning values based on reference column using pandas without running a loop

Question

I have 2 complex dataframes of serial numbers (with a couple dozen columns of other information). The serial numbers can appear multiple times each in the first data frame. The second datafram is a subset of the first with all duplicates removed. In the second DF, I generated a unique code for each serial number but now need to assign that unique code to each instance of the serial number in the first dataframe.

I know how to do this using a for loop, but I'm trying to do it more efficiently since the datasets are massive. I have tried various types merge() but can't get the desired output maybe because the dataframe lengths are significantly different. Any ideas on how to do this with existing libraries? Thank you in advance.

This is an oversimplified view of the dataframes and the intended output.

Reference Dataframe

SERIAL1	SERIALCODE
666661	C12345
666662	C12346
666663	C12347
666664	C12348
666665	C12349

Active Dataframe

SERIAL1	SERIALCODE
666663	NaN
666664	NaN
666665	NaN
666661	NaN
666663	NaN
666665	NaN
666662	NaN
666663	NaN

Desired Output

SERIAL1	SERIALCODE
666661	C12345
666662	C12346
666663	C12347
666661	C12345
666664	C12348
666665	C12349
666661	C12345
666663	C12346
666665	C12349
666662	C12346
666663	C12347

I tried it but SERIALCODE came up blank. I'll try to add more context: This is how I create the reduced file: df5 = df2.drop_duplicates(subset=['SERIAL1']) I entered: df5.merge(df2, how='left', on='SERIAL1' , suffixes=(None, '_y') ).drop(columns=['SERIALCODE']).rename(columns={'SERIALCODE_y':'SERIALCODE}) Also tried df2 = df5 — Tac147, Jul 30 '22 at 00:43

score 0 · Answer 1 · answered Jul 30 '22 at 00:32

0

df.merge(df2, how='left',
        on='SERIAL1' ,
        suffixes=(None, '_y') ).drop(columns=['SERIALCODE']).rename(columns={'SERIALCODE_y':'SERIALCODE'})

    SERIAL1     SERIALCODE
0    666663     C12347
1    666664     C12348
2    666665     C12349
3    666661     C12345
4    666663     C12347
5    666665     C12349
6    666662     C12346
7    666663     C12347

answered Jul 30 '22 at 00:32

Naveed

11,495
2
14
21

In such case using `map` on a crafted Series is more efficient (see duplicate). – mozway Jul 30 '22 at 00:39
yes. thanks for the feedback @mozway. df['SERIALCODE']=df['SERIAL1'].map(df2.set_index(['SERIAL1'])['SERIALCODE'] ) – Naveed Jul 30 '22 at 01:15
That worked! You two are my literal heroes of the day. Thank you – Tac147 Jul 30 '22 at 02:01

Assigning values based on reference column using pandas without running a loop

1 Answers1