0

I have the following df:

df
      Site        Process     parameter       value    unit   
0      Mid  Biomass plant        cap-up     5000.00      
1      Mid  Biomass plant      inv-cost   875000.00    
2      Mid  Biomass plant  depreciation       25.00      
3      Mid     Coal plant        cap-up        0.00      
4      Mid     Coal plant      inv-cost   600000.00    
5      Mid     Coal plant  depreciation       40.00    

I would like to have a df for units something like following:

df_unit
      unit
0       MW    
1    €/MWh  
2        a

and I would like to merge the df and df_unit together. Probably best way to do this with pandas join(), but I don't know how to use it. So can u help me to get the following df_merged from df and df_unit?

Following output is the expected outcome.

df_merged
      Site        Process     parameter       value    unit   
0      Mid  Biomass plant        cap-up     5000.00      MW
1      Mid  Biomass plant      inv-cost   875000.00   €/MWh
2      Mid  Biomass plant  depreciation       25.00       a
3      Mid     Coal plant        cap-up        0.00      MW
4      Mid     Coal plant      inv-cost   600000.00   €/MWh
5      Mid     Coal plant  depreciation       40.00       a
oakca
  • 1,408
  • 1
  • 18
  • 40
  • 2
    Are you mapping `parameter` to `unit` here? – Jon Clements Feb 27 '19 at 16:42
  • kinda that is what I am trying to do "assigning unit values with specific parameters" – oakca Feb 27 '19 at 16:43
  • I think you forgot to mention you want to merge on `parameter` – yatu Feb 27 '19 at 16:43
  • yes, df_unit can be anything I just need to assign the units to the right parameters. – oakca Feb 27 '19 at 16:44
  • 2
    You can use a `dict` as a mapping, eg: `df.parameter.map({'cap-up': 'a', 'inv-cost': 'b', 'depreciation': 'c'})` - although, if you've got a couple of unique keys, it does look like you should probably pivot instead though? – Jon Clements Feb 27 '19 at 16:46
  • Does the `parameter` value `cap-up` always have a unit of `MW`? – Mortz Feb 27 '19 at 16:46
  • @Mortz yes, all of them are specific and will not change. So cap-up is always MW, inv-cost is always €/MWh, so on.. The only problem is I don't wanna go with a for loop over the df, because the df itself can be long – oakca Feb 27 '19 at 16:48
  • @oakca if you can consider those columns then why not something like: `df.set_index(['Site', 'Process']).pivot(columns='parameter')` ? (setting the index to whatever is unique to group those "value"s... ? – Jon Clements Feb 27 '19 at 16:49
  • @JonClements how will it know to put them under unit? – oakca Feb 27 '19 at 16:49
  • Its a replace/map problem. See [this](https://stackoverflow.com/questions/20250771/remap-values-in-pandas-column-with-a-dict). You can create mapping dictionary like {'cap-up':'MW'} so on – Vaishali Feb 27 '19 at 16:49
  • It won't... but it'll put em as separate columns whereby you know what the units are for 'em, and you can use those columns separately etc... depends what you're trying to do, but it makes more sense that adding a unit columns which you'll already know just by the column name.... would also make rows more useful for further work you wish to do on it – Jon Clements Feb 27 '19 at 16:50
  • @oakca try the pivot - I have a feeling that's more what you actually want... - then you can just rename the columns with the units at the end if you really want... makes the data more useful and means you're not creating an extra column that's basically redundant... – Jon Clements Feb 27 '19 at 16:53

1 Answers1

0

On your comments, you have mentioned you want to assign it to the right parameter. So, you need a parameter column in the units data-frame.

Assuming you have one,

Step 1:- Use a dictionary and read the unit dataframe inside it.

dictitems = dict(zip(df_Unit[Parameter_Col], df_Unit[Unit_Col]))

What the code above is doing is, it is looking to create a zip of items between the parameter column and the unit column available in your units dataframe.

Step 2 :- Now Call the dictionary inside a dataframe

df["Unit_Column"] = df["parameter"].map(dictitems).fillna("Unit Not Found")

What the code above is doing is, it is mapping the dictionary to the parameter column on your actual dataframe and adds it as a new column.

Let me know, if you have any questions on this one.