0

I have a dictionary "d" which has a 10 keys with pyspark dataframes as values.

 >> d.keys()
  dict_keys (['Py1', 'Py2', 'Py3', 'Py4', 'Py7', 'Py8', 'Py15', 'Py20', 'Py21', 'Py22']

I am currently taking each key and its value, then assigning it to a variable like so:

   df1 = d['Py1'] 
   df2 = d['Py2']
   df3 = d['Py3']
  .
  .
  .
  df10 = d['Py22']

I then do various manipulations using pyspark. What is the best way achieving this without the redundancy? here is what i attempted..

 newname = "df"
 counter = 1
 for key in df_list.keys():
 key = newname + str(counter)
 counter+=1
 print (key)

But when i do print(df1) i get a "name 'df1' is not defined" error.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
user12501784
  • 41
  • 1
  • 6
  • Shouldn't it be df1 = d['Py1']? – Harshith Bolar Feb 24 '20 at 15:33
  • @HarshithBolar correct, error on the question. I'm not assigning values that way in my actual script. – user12501784 Feb 24 '20 at 16:10
  • Why do you *want* individual variables, instead of just using the values in the `dict` directly? A bunch of sequentially named variables is [almost always the wrong thing to do](https://stackoverflow.com/q/6181935/364696)... – ShadowRanger Feb 24 '20 at 17:11
  • @ShadowRanger I want to work on the dataframes individually and its easier for me to access them and do data manipulations this way. – user12501784 Feb 24 '20 at 17:17
  • @user12501784: It really, really isn't though. `d['Py1']` is not meaningfully more difficult than `df1`, while making sure the latter (and its 22-odd siblings) even exist is a royal pain. If you think this is a problem you need solved, you probably have [an XY problem](https://meta.stackexchange.com/q/66377/322040). – ShadowRanger Feb 24 '20 at 19:45

2 Answers2

0

Yes you can use globals() provided you have all the dfs globally.

newname = "df"
d = {k: globals()[newname + str(counter)] for counter, k in enumerate(d, start = 1)}
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52
0

Let's assume you have your df in a list called dfs. I would use a combination of a comprehension and the enumerate function.

out = {newname + str(i): df for i, df in enumerate(dfs, 1)}

The function enumerate wraps an iterable and returns the tuple (index, value). It is very convenient when you need to refer to both the value and location of each element in a list. Also note the use of tuple unpacking to give a name to both of the items returned by enumerate.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
yardsale8
  • 940
  • 9
  • 15