2

I have run a logistic regression using scikit learn in python. I know want to output the results to put into a csv and then load into Tableau. To do that I need to combine the y_test, y_actual, and X_test data. I was wondering if there's a way to output the y_test, y_actual, and X_test data? I know I can use to_csv, but when I try that I can extract each of these data and concat together, but I'm afraid its not matching correctly since there's no identifying to join on.

Alternatively, is there a way to keep a unique id (uid) with the logistic regression so then it's easy to see what the regression predicts for a specific person?

user57759
  • 21
  • 1
  • 2

1 Answers1

2

Each line is treated independently at prediction, so you can be sure that the data is kept in the same order.

For simplicity, you can keep your data in a pandas dataframe. Here's a short working example:

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=100, n_features=10)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

clf = LogisticRegression()
clf.fit(X_train, y_train)

data_test = pd.DataFrame(data=X_test, columns=['f{}'.format(i) for i in range(1, 11)])
data_test['y_test'] = y_test
data_test['y_pred'] = clf.predict(X_test)
amyrit
  • 256
  • 3
  • 5