Thinking in terms of directional derivatives might be more instructive in this case, as we can arrive at the chain rule formulation in a constructive fashion.
Let us consider the directional derivative of a multivariate function $f : \mathbb{R^n} \rightarrow \mathbb{R}$, in an arbitrary direction $u \in \mathbb{R^n}$.
Since $u$ is a direction, we shall assume $||\textbf{u}|| = 1$. The directional derivative in the direction of $\textbf{u}$, is then defined as
$$\nabla_u f = \nabla f \cdot \textbf{u}$$
This can easily be proved by noting, that under the usual limit definition of a derivative, we have
$$\nabla_u f (\textbf{x}) = \lim_{h\rightarrow0} \frac{f(\textbf{x} + h\textbf{u}) - f(\textbf{x})}{h }$$
Since we assume differentiability of the objective function $f$, we can find a linear approximant of $f$ around any arbitrary point $\textbf{a}$ that is close to the true value of $f(\textbf{x})$ in any $\epsilon$-neighborhood of $\textbf{a}$
$$f(\textbf{x}) = f(\textbf{a}) + \nabla f(\textbf{a})^T \cdot (\textbf{x} - \textbf{a})$$
Plugging this approximant into our previous limit formulation we get, for any $\textbf{a}$
$$\nabla_u f (\textbf{a}) = \nabla f(\textbf{a}) \cdot \textbf{u}$$
Going back to the original problem, when differentiating the cost function $E_n$ by an arbitrary parameter $a_k$ we also need to take into account the perturbations induced by a change in $a_k$ in any other parameters it interacts with. Specifically, when altering $a_k$ we shall also move along the $\textbf{direction}$ of the perturbed parameters.
This direction essentially encapsulates all the changes caused in intermediate variables $\{a_k\}_{k=1}^{n}$ that appear when altering a certain $a_j$. As such, for every $a_k$ that is directly influenced by $a_j$, we can measure the actual change by evaluating $\frac{\partial a_k}{\partial a_j}$.
Let $\textbf{p}$ be the aforementioned vector, therefore we can write
$$\textbf{p}_k = \bigg(\frac{\partial a_k}{\partial a_j}\bigg)$$
where $k$ runs through all parameters $a_k$ such that $a_k $ is influenced by $a_j$.
Coupling the directional derivative with the previously described concept, we arrive at the desired result, namely
$$\frac{\partial E_n}{\partial a_j} = \nabla E_n ^T \cdot \textbf p = \sum_{k \in \mathcal{S}} \frac{\partial E_n}{\partial a_k} \frac{\partial a_k}{\partial a_j}$$
where $\mathcal{S}$ is the set of all indices corresponding to variables directly influenced by $a_j$.
\partial. There's an edit link underneath the question. It might also be a good idea to link to the book in the question so people can look up the context. – joriki Feb 20 '13 at 18:22