Here's my attempt to explain a proof that I learned from the book Fundamentals of Convex Analysis by Hiriart-Urruty and Lemarechal.
Some background knowledge and notation:
I'll first review some relevant background knowledge from convex analysis.
Let $K \subset \mathbb R^n$ be a convex cone. The "polar" of $K$ is the cone $K^\circ$ defined by
$$
K^\circ = \{ w \in \mathbb R^n \mid \langle v, w \rangle \leq 0 \text{ for all } v \in K\}.
$$
It can be shown that if $K$ is a closed convex cone, then
$$
(K^\circ)^\circ = K.
$$
Suppose that $C \subset \mathbb R^n$ is a convex set and $\bar x \in C$. A "feasible vector" or "feasible direction" for $C$ at $\bar x$ is a vector $v \in \mathbb R^n$ such that $\bar x + t v \in C$ for some $t > 0$. (So if you start at the location $\bar x$ and move a short distance in the direction $v$, you will not leave $C$.) The set of all feasible vectors for $C$ at $\bar x$ is denoted $F_C(\bar x)$. The tangent cone to $C$ at $\bar x$, denoted $T_C(\bar x)$, is by definition the closure of $F_C(\bar x)$:
$$
T_C(\bar x) = \overline{F_C(\bar x)}.
$$
(This definition of the tangent cone is only correct when $C$ is convex.)
The normal cone to $C$ at $\bar x$ is the set $N_C(\bar x)$ defined by
$$
N_C(\bar x) = \{ w \in \mathbb R^n \mid \langle x - \bar x, w \rangle \leq 0 \text{ for all } x \in C \}.
$$
It can be shown that $T_C(\bar x)$ and $N_C(\bar x)$ are both closed convex cones and that
$$
N_C(\bar x) = T_C(\bar x)^\circ.
$$
Suppose that $f:\mathbb R^n \to \mathbb R \cup \{\infty\}$ is a convex function and $\bar x \in \mathbb R^n$. A "subgradient" of $f$ at $\bar x$ is a vector $g \in \mathbb R^n$ such that
$$
f(x) \geq f(\bar x) + \langle g, x - \bar x \rangle \quad \text{for all } x \in \mathbb R^n.
$$
The set of all subgradients of $f$ at $\bar x$ is called the "subdifferential" of $f$ at $\bar x$ and is denoted $\partial f(\bar x)$.
The "effective domain" of a function $f:\mathbb R^n \to \mathbb R \cup \{\infty\}$ is the set $\textbf{dom} \, f$ defined by
$$
\textbf{dom} \, f = \{ x \in \mathbb R^n \mid f(x) < \infty \}.
$$
It can be shown that if $f$ is convex and $\bar x$ is in the interior of the effective domain of $f$, then $\partial f(\bar x)$ is a non-empty, convex, compact set.
It can also be shown that if $f:\mathbb R^n \to \mathbb R \cup \{\infty \}$ is convex and $\bar x$ is in the interior of the effective domain of $f$, then
$$
D_v f(\bar x) = \max_{g \in \partial f(\bar x)} \langle g, v \rangle.
$$
Here
$$
D_v f(\bar x) = \lim_{t \to 0^+} \frac{f(\bar x + tv) - f(\bar x)}{t}
$$
is the directional derivative of $f$ at $\bar x$ in the direction $v$.
If $f:\mathbb R^n \to \mathbb R \cup \{\infty\}$ is convex and $\bar x$ is in the interior of the effective domain of $f$, then the function
$$
h(v) = D_v f(\bar x)
$$
is a finite sublinear function. It follows that $h$ is continuous.
Statement of theorem:
Let $f:\mathbb R^n \to \mathbb R \cup \{ \infty \}$ be a proper convex function. Suppose that $\bar x$ belongs to the interior of the effective domain of $f$ and that $\inf_{x \in \mathbb R^n} f(x) < f(\bar x)$. If
$$
C = \{ x \in \mathbb R^n \mid f(x) \leq f(\bar x) \}
$$
then
$$
N_C(\bar x) = \mathbb R^+ \partial f(\bar x) = \{ t g \mid t \geq 0, g \in \partial f(\bar x) \}.
$$
Here $N_C(\bar x)$ is the normal cone to $C$ at $\bar x$.
Proof strategy:
We can prove this theorem in the following steps.
We'll first show that the tangent cone to $C$ at $\bar x$ is
$$
\tag{$\spadesuit$} T_C(\bar x) = \{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0\}.
$$
Because $\bar x$ is in the interior of the effective domain of $f$, we have that $\partial f(\bar x)$ is nonempty and compact and
$$
D_v f(\bar x) = \max_{g \in \partial f(\bar x)} \langle g, v \rangle \qquad \text{for any } v \in \mathbb R^n.
$$
We notice that
\begin{align}
\{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0 \} &=
\{ v \in \mathbb R^n \mid \langle g, v \rangle \leq 0 \text{ for all } g \in \partial f(\bar x) \} \\
&= \{ v \in \mathbb R^n \mid \langle tg, v \rangle \leq 0 \text{ for all } g \in \partial f(\bar x), t \geq 0 \} \\
\tag{$\clubsuit$}&= \left( \mathbb R^+ \partial f(\bar x) \right)^\circ.
\end{align}
Here
$$
\mathbb R^+ \partial f(\bar x) = \{ t g \mid g \in \partial g(\bar x) \}
$$
is the convex cone generated by $\partial f(\bar x)$.
Because $\partial f(\bar x)$ is compact, the set $\mathbb R^+ \partial f(\bar x)$ is in fact a closed convex cone.
Putting the above pieces together, we see that
\begin{align}
N_C(\bar x) &= T_C(\bar x)^\circ \\
&=
\{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0 \}^\circ \\
&= \left(\mathbb R^+ \partial f(\bar x) \right)^{\circ \circ} \\
&= \mathbb R^+ \partial f(\bar x).
\end{align}
This is what we wanted to show.
Proof details: Steps 2 and 3 above are already complete. We only need to provide details for step 1.
Equation ($\spadesuit$) seems plausible. Certainly if $v$ is a feasible direction for $C$ at $\bar x$ then $D_vf(\bar x) \leq 0$. Indeed, if $f(\bar x + tv) \leq f(\bar x)$ for all sufficiently small positive $t$, then
$$
D_v f(\bar x) = \lim_{t \to 0^+} \frac{f(\bar x + tv) - f(\bar x)}{t} \leq 0.
$$
Because the function $v \mapsto D_v f(\bar x)$ is continuous, it follows that if $v$ is a limit of feasible directions then $D_v f(\bar x) \leq 0$. Thus,
$$
\tag{1} T_C(\bar x) \subset \{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0\}.
$$
The only challenge is to prove the reverse inclusion.
Let
$$
S = \{ v \in \mathbb R^n \mid D_v f(\bar x) < 0 \}.
$$
It is clear that
$$
S \subset F_C(\bar x)
$$
because if $D_v f(\bar x) < 0$ then moving a short distance away from $\bar x$ in the direction $v$ will reduce the value of $f$.
Thus,
$$
\bar S \subset \overline{F_C(\bar x)} = T_C(\bar x).
$$
But it can be shown that
$$
\tag{2} \bar S = \{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0 \}.
$$
It follows that
$$
\tag{3} \{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0 \} \subset T_C(\bar x).
$$
The results (1) and (3) together show that $T_C(\bar x) = \{ v \in \mathbb R^n \mid D_v f(\bar x) \leq 0 \}$, which is what we wanted to show.
The only missing detail in the above proof is to establish equation (2).
Let $h:\mathbb R^n \to \mathbb R$ be the function defined by
$$
h(v) = D_v f(\bar x).
$$
As mentioned in the "background knowledge" section above, this function $h$ is finite and sublinear, hence convex. We are given that there exists a point $x^* \in \mathbb R^n$ such that $f(x^*) < f(\bar x)$. It follows that
$h(v^*) < 0$, where $v^* = x^* - \bar x$. Equation (2) now follows from the following lemma.
Lemma: Suppose that $h:\mathbb R^n \to \mathbb R$ is convex and that $h(v^*) < 0$ for some vector $v^* \in \mathbb R^n$. Then
$$
\tag{4} \overline{\{v \in \mathbb R^n \mid h(v) < 0 \}} = \{ v \in \mathbb R^n \mid h(v) \leq 0 \}.
$$
Proof of lemma:
Any convex function $h:\mathbb R^n \to \mathbb R$ is continuous. Hence, the set $\{ v \in \mathbb R^n \mid h(v) \leq 0 \}$ is closed. It follows that
$$
\tag{5} \overline{\{v \in \mathbb R^n \mid h(v) < 0 \}} \subset \{ v \in \mathbb R^n \mid h(v) \leq 0 \}.
$$
Is the reverse inclusion true also? Well, suppose that $v \in \mathbb R^n$ and $h(v) \leq 0$.
We are given that there exists a vector $v^* \in \mathbb R^n$ such that
$$
h(v^*) < 0.
$$
Let
$$
v_k = \frac{1}{k} v^* + \frac{(k-1)}{k} v \quad \text{for each positive integer }k
$$
and note that $v_k \to v$ as $k \to \infty$.
Because $h$ is convex, we have that
$$
h(v_k) \leq \frac{1}{k} \underbrace{h(v^*)}_{\substack{\big \uparrow\\ < \, 0}} + \frac{(k-1)}{k} \underbrace{h(v)}_{\substack{\big \uparrow \\ \leq \, 0}} < 0
$$
for each positive integer $k$. So $v$ is a limit of vectors $v_k$ which satisfy $h(v_k) < 0$. This shows that
$$
\tag{6} \{ v \in \mathbb R^n \mid h(v) \leq 0 \} \subset \overline{\{v \in \mathbb R^n \mid h(v) < 0 \}}.
$$
Equations (5) and (6) together show that (4) holds, so the lemma is proved.