3

I am using three different of the shelf classifiers. It's a three class classification task. I want to calculate the optimal weights (c1weight, c2weight, c3weight) for each classifier (real task more classifiers and also weights for each class).

Maybe simple grid search approach or sklearn ensemble classifier could do that.

vc = VotingClassifier(estimators=[('gbc',GradientBoostingClassifier()),
                   ('rf',RandomForestClassifier()),('svc',SVC(probability=True))],
                   voting='soft',n_jobs=-1)

params = {'weights':[[1,2,3],[2,1,3],[3,2,1]]} grid_Search = GridSearchCV(param_grid = params, estimator=vc) grid_Search.fit(X_new,y) print(grid_Search.best_Score_)

I don't understand how to implement this for the following code.

def get_classification(text, c1weight, c2weight, c3weight):

prediction1 = classifier1.predict(text) if prediction1 = 1: class1 =+ 1 * c1weight elif prediction1 = 2: class2 =+ 1 * c1weight else: class3 =+ 1 * c1weight

prediction2 = classifier2.predict(text) if prediction2 = 1: class1 =+ 1 * c2weight elif prediction2 = 2: class2 =+ 1 * c2weight else: class3 =+ 1 * c2weight

prediction3 = classifier3.predict(text) if prediction3 = 1: class1 =+ 1 * c3weight elif prediction3 = 2: class2 =+ 1 * c3weight else: class3 =+ 1 * c3weight

if class1 > class2 and class1 > class3: return ("class1",class1) elif class2 > class1 and class2 > class3: return ("class2",class2) else: return("class3",class3)

c1weight = 0.5 c2weight = 0.7 c3weight = 0.4

for i, row in df_raw.iterrows(): classification = get_classification(df_raw.at[i, 'text'],c1weight, c2weight, c3weight) df_raw[i,'classification'] = classification

score = get_accuracy(df_raw['classification'],df_raw['label'])

Ben Reiniger
  • 12,855
  • 3
  • 20
  • 63
bartman99
  • 51
  • 5

2 Answers2

1

GridSearch finds those optimals weights for you.

You can access these weights through the attribute best_params_ of the GridSearch object, which will return all the optimal parameters (including the weights):

optimal_weights = grid_Search.best_params_
0

(This is the asker's solution, moved from question and comment to answer)


This sample code helped me to understand it:

def your_function(number):
    print(number)

from sklearn.model_selection import ParameterGrid param_grid = {'param1': [1, 2, 3]}

grid = ParameterGrid(param_grid)

for params in grid: your_function(params['param1'])

I had too much paramaters for gridsearch. In this case it would take months to calculate all combinations. Finally i used hyperopt for the hyperparameter optimization. There are some nice basic tutorials out there. This one helped me a lot. You can also find a python notebook there. https://towardsdatascience.com/an-introductory-example-of-bayesian-optimization-in-python-with-hyperopt-aae40fff4ff0

Ben Reiniger
  • 12,855
  • 3
  • 20
  • 63