0

I have a dataset consisting of mixed type of features, I already transformed the categorical ones in descrete and binary. Because the dataset is highly dimensional I want to use PCA to reduce it. I used StandardScaler to the dataset then I use PCA to reduce to a number of components that explain 80% of variance. Then I used the pca transformed data on KMeans.

X_std = StandardScaler().fit_transform(df_num) 
pca.fit(X_std)
var = pca.explained_variance_ratio_
cs = var.cumsum()
n = np.argmax(cs >= 0.80)+1
X_pca = PCA(n_components=n).fit_transform(X_std)
X_pca.shape
y = KMeans(n_clusters=3,init='k-means++',n_init=100,max_iter=300,random_state=42).fit_predict(X_pca)
Oxbowerce
  • 8,522
  • 2
  • 10
  • 26

0 Answers0