0

I have 2 complex dataframes of serial numbers (with a couple dozen columns of other information). The serial numbers can appear multiple times each in the first data frame. The second datafram is a subset of the first with all duplicates removed. In the second DF, I generated a unique code for each serial number but now need to assign that unique code to each instance of the serial number in the first dataframe.

I know how to do this using a for loop, but I'm trying to do it more efficiently since the datasets are massive. I have tried various types merge() but can't get the desired output maybe because the dataframe lengths are significantly different. Any ideas on how to do this with existing libraries? Thank you in advance.

This is an oversimplified view of the dataframes and the intended output.

Reference Dataframe

SERIAL1 SERIALCODE
666661 C12345
666662 C12346
666663 C12347
666664 C12348
666665 C12349

Active Dataframe

SERIAL1 SERIALCODE
666663 NaN
666664 NaN
666665 NaN
666661 NaN
666663 NaN
666665 NaN
666662 NaN
666663 NaN

Desired Output

SERIAL1 SERIALCODE
666661 C12345
666662 C12346
666663 C12347
666661 C12345
666664 C12348
666665 C12349
666661 C12345
666663 C12346
666665 C12349
666662 C12346
666663 C12347
Tac147
  • 33
  • 4
  • I tried it but SERIALCODE came up blank. I'll try to add more context: This is how I create the reduced file: df5 = df2.drop_duplicates(subset=['SERIAL1']) I entered: df5.merge(df2, how='left', on='SERIAL1' , suffixes=(None, '_y') ).drop(columns=['SERIALCODE']).rename(columns={'SERIALCODE_y':'SERIALCODE}) Also tried df2 = df5 – Tac147 Jul 30 '22 at 00:43

1 Answers1

0
df.merge(df2, how='left',
        on='SERIAL1' ,
        suffixes=(None, '_y') ).drop(columns=['SERIALCODE']).rename(columns={'SERIALCODE_y':'SERIALCODE'})

    SERIAL1     SERIALCODE
0    666663     C12347
1    666664     C12348
2    666665     C12349
3    666661     C12345
4    666663     C12347
5    666665     C12349
6    666662     C12346
7    666663     C12347
Naveed
  • 11,495
  • 2
  • 14
  • 21