3

I'm trying to understand the gradient derivation for the back-propagation algorithm.

I'm having trouble computing the explicit derivative of the Loss Mean Square Error function with respect to the output value in a regression setting. I only have one output neuron.

Let,

  • $n$ be the number of training examples
  • $ y_i $ be the predicted target for training example $x_i$
  • $ t_i $ be the actual target value (from train data) for training example $x_i$
  • $ L_i $ be the loss for sample $i$

I'm using the following definition of the loss function,

$$ E = \frac{1}{2n} \sum_{i=1}^{n} L_i = \frac{1}{2n} \sum_{i=1}^{n} \frac{1}{2} ( y_i - t_i)^2 $$

how do I compute, $\frac{\partial E}{\partial y}$ ?

This is in a neural network setting, so $E$ is a function of $w$ the weights, in the Bishop Book equation (5.11) is, as far as I can see, the same expression except that it's not divided by $n$,

$$ E(w) = \frac{1}{2} \sum_{i=1}^n (y(x_i, w) - t_i)^2 $$ So here $y$ is a function which depends on $x_i$ and $w$, so writing,

$$ \frac{\partial E}{\partial y} $$ means deriving by a function ??

And yet Bishop does this at equation (5.19),

$$ \frac{\partial E}{\partial y_k} = y_k − t_k $$

Where $y_k$ is the output of the kth neuron and $t_k$ the actual target value, but where are the instances gone ? They've dissapeared from the equation ! $y_k$ is predicted for an input $x$ !

I don't understand the nature of $y$ and why it's legal to derive E with respect to it.

Thanks for any help.

1 Answers1

1

Using this preprint I found on Arxiv I was able to address the problem by doing the following:

Using the preprint, and a bit that you have described in your problem. I believe Bishop is referring to a single instance or data point in that statement because the formula both on Wikipedia, and Arxiv point to that your iterating over the training sample space, so when referring to one instance or the $k\mathbb{th}$ instance one can begin to see where the derivation comes from in this case.

So in this case to write the Cost Function for the problem I would begin by writing the following:

\begin{align} &=\frac{1}2(y_k-t_k)^2 \end{align}

Now we can take the partial to calculate the following: \begin{align} (y_k-t_k)&=\frac{1}2\frac{\delta}{\delta y_k}(y_k-t_k)^2\\ &= (y_k-t_k) \end{align}

Jose M Serra
  • 2,843