What's wrong with my implementation of the Adaline algorithm?

Question

I'm working through the textbook called Learning From Data and one of the problems from the first chapter has the reader implement the Adaline algorithm from scratch and I chose to do so using Python. The issue I'm running into is that the weights for my $\textbf{w}$ immediately blow up to infinity before my algorithm converges. Is there something incorrect I am doing here? It looks like I am implementing it exactly as the text describes. Below I've provided the question and my Python code. Here $\textbf{y}$ takes on the values of -1 and 1. So it is a classification problem.

import numpy as np
import pandas as pd
#Generate w* vector, the true weights
dim=2
wstar=2000*np.random.rand(dim+1)-1000
#Generate the random sample of size 100
trainSize=100
train=pd.DataFrame(2000*np.random.rand(trainSize,dim)-1000)
train['intercept']=np.ones(trainSize)
cols=train.columns.tolist()
cols=cols[-1:]+cols[:-1]
train=train[cols]
#Classify the points
train['y']=np.sign(np.dot(train.iloc[:,0:3],wstar))
#Now we run the ADALINE algorithm on the training data
#Declare w vector
w=np.zeros(dim+1)
#Column of guesses
train['guess']=np.ones(trainSize)
#s column
train['s']=np.dot(train.iloc[:,0:3],w)
#Set eta
eta=5
iterations=0
while (all((train['y']train['s'])>1)==False):
    if iterations>=1000:
        break
    #Picking a random point
    randInt=np.random.randint(len(train))
    #Temporary values for calculating new w
    temp_s=train['s'].iloc[randInt]
    temp_x=train.iloc[randInt,0:3]
    temp_y=train['y'].iloc[randInt]
    #Calculating new w
    if temp_ytemp_s<=1:
        w=w+eta(temp_y-temp_s)temp_x
        #Calculating new guesses and s values
        train['s']=np.dot(train.iloc[:,0:3],w)
        train['guess']=np.sign(train['s'])
    iterations+=1

score 4 · Answer 1 · answered Sep 29 '20 at 15:27

First of all, let me add this schema which I think is quite nice to understand the transition and improvement from the initial Rosenblatt's perceptron and the Adaline algorithm:

In Adaline, provided that the cost function (your y(t)-s(t)) is differentiable, the weights can be updated and there is no restriction of y and s having the same sign: the objective is to minimize the cost y-s.

Below you can find the code provided in the excellent book by Sebastian Raschka:

class AdalineSGD(object):
"""ADAptive LInear NEuron classifier.
    Parameters
    ------------
    eta : float
    Learning rate (between 0.0 and 1.0)
    n_iter : int
    Passes over the training dataset.
    shuffle : bool (default: True)
    Shuffles training data every epoch if True
    to prevent cycles.
    random_state : int
    Random number generator seed for random weight
    initialization.
    Attributes
    -----------
    w_ : 1d-array
    Weights after fitting.
    cost_ : list
    Sum-of-squares cost function value averaged over all
    training samples in each epoch.
"""
def __init__(self, eta=0.01, n_iter=10,
                shuffle=True, random_state=None):
    self.eta = eta
    self.n_iter = n_iter
    self.w_initialized = False
    self.shuffle = shuffle
    self.random_state = random_state
def fit(self, X, y):
    """ Fit training data.
    Parameters
    ----------
    X : {array-like}, shape = [n_samples, n_features]
    Training vectors, where n_samples is the number
    of samples and
    n_features is the number of features.
    y : array-like, shape = [n_samples]
    Target values.
    Returns
    -------
    self : object
    """
    self.initialize_weights(X.shape[1])
    self.cost = []
    for i in range(self.n_iter):
        if self.shuffle:
            X, y = self.shuffle(X, y)
        cost = []
        for xi, target in zip(X, y):
            cost.append(self._update_weights(xi, target))
        avg_cost = sum(cost) / len(y)
        self.cost.append(avg_cost)
return self


def partial_fit(self, X, y):
    """Fit training data without reinitializing the weights"""
    if not self.w_initialized:
        self._initialize_weights(X.shape[1])
    if y.ravel().shape[0] > 1: #if we have more than one sample
        for xi, target in zip(X, y):
            self._update_weights(xi, target)
    else:
        self._update_weights(X, y)
return self


def _shuffle(self, X, y):
    """Shuffle training data"""
    r = self.rgen.permutation(len(y))
return X[r], y[r]


def _initialize_weights(self, m):
    """Initialize weights to small random numbers"""
    import numpy as np
self.rgen = np.random.RandomState(self.random_state)
self.w_ = self.rgen.normal(loc=0.0, scale=0.01,
                           size=1 + m)

self.w_initialized = True


def update_weights(self, xi, target):
    """Apply Adaline learning rule to update the weights"""
    output = self.activation(self.net_input(xi))
    error = (target - output)
    self.w[1:] += self.eta * xi.dot(error)
    self.w_[0] += self.eta * error
    cost = 0.5 * error**2
return cost


def net_input(self, X):
    """Calculate net input"""
return np.dot(X, self.w_[1:]) + self.w_[0]


def activation(self, X):
    """Compute linear activation"""
    return X
def predict(self, X):
    """Return class label after unit step"""
return np.where(self.activation(self.net_input(X))
                &gt;= 0.0, 1, -1)

What's wrong with my implementation of the Adaline algorithm?

1 Answers1