0

I'm trying to implement gradient descent in Python and following Andrew Ng course in order to follow the math. However, my implementation isn't working as I expected. It would be great if the community can help me to identify my mistake.

When I increase the range from 3 to higher number, it does not converge, rather thetas move from very positive to very negative and finally get nan because they get extremely small.

Code is given below:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

X = pd.DataFrame(load_boston().data, columns = load_boston().feature_names) X['theta0'] = 1 y = load_boston().target y = pd.DataFrame(y, columns = ['target']) theta = pd.DataFrame(np.random.randn(X.shape[1]),columns = ['target'], index = X.columns.values)

print('theta shape',theta.shape) print('X shape',X.shape) print('y shape',y.shape) print(theta)

def predict(X,theta, ycol = 'target'): return X.dot(theta)

mse_values =[] alpha = 0.01 for i in range(10000): error = predict(X,theta) - y theta = theta - ((alpha)* (1/len(X)) * X.T.dot(error)) mse= np.sum(error*2)/len(X) print('mse: ', mse.values) mse_values.append(mse) print('+'5)

plt.plot(mse_values) plt.show()

Shoaibkhanz
  • 111
  • 3

2 Answers2

1

I was doubting my implementation all the way but it was the learning rate. After a lot of experimentation, I found the right one, but I'm very much surprised to see how small the learning rate had to be in order for it to work, i.e alpha = 0.000001

Shoaibkhanz
  • 111
  • 3
1

If you use the backtracking method (details in my answer in this link:

Does gradient descent always converge to an optimum?)

then you can avoid spending time to manually find the "right learning rate" as in your case here.

Tuyen
  • 141
  • 1
  • 4