I call the set of all functions $\mathbb{R}^k \rightarrow \mathbb{R}^n$ that are a composition of relus and affine maps representable
We want to show that the set of (restrictions to $[-1,1]^k$ of) representable functions contains the set of all piecewise linear functions. (The other direction is easy).
Let us show that the space of representable functions is closed under a couple of useful operations. Later we try to show that these operations suffice to generate all piecewise linear functions.
- Every affine function is representable
- The copy operation $f:\mathbb{R}^n \rightarrow \mathbb{R}^{2n}$ given by $x \mapsto (x, x)$ is representable
- If $f:\mathbb{R}^{k_1} \rightarrow \mathbb{R}^{n_1}$ and $f:\mathbb{R}^{k_2} \rightarrow \mathbb{R}^{n_2}$ are both representable, so is their cartesian product $(x,y) \mapsto (f(x), g(y))$
- If $f,g: \mathbb{R}^k \rightarrow \mathbb{R}^n$ are representable so is their sum $f+g: \mathbb{R}^{k} \rightarrow \mathbb{R}^n$
- $f:\mathbb{R}^k \rightarrow \mathbb{R}^n$ is representable if and only if each of its coordinate functions $f_i:\mathbb{R}^k \rightarrow \mathbb{R}$ is representable.
- Halfspace projections are representable. More precisely let $A \in \mathbb{R}^{1 \times k}$ and $b \in \mathbb{R}$. Denote by $H$ the set of all points that satisfy $Ax <= b$.
Then $proj(x) = argmin_{h \in H} ||x-h||$ is representable.
Proof: By composing with affine maps, we can reduce to the case $A = (1,0,0,...)$ and $b=0$.
Then the cartesian product function $(relu, id, id, id)$ does the job.
- If $f,g$ are representable and coincide along a hyperplane, then so is the function obtained
by glueing them together along that hyperplane. More precisely:
Let $f,g:\mathbb{R}^n \rightarrow \mathbb{R}^k$ and $A \in \mathbb{R}^{1 \times k}$ and $b \in \mathbb{R}$.
Assume $f(x) = g(x)$ for all $x$ that satisfy $(Ax = b)$.
Then the piecewise linear function $r$ given by:
$$
\begin{align}
r(x) &= f(x) \text{ for } Ax <= b \\
r(x) &= g(x) \text{ for } Ax >= b
\end{align}
$$
is representable.
Proof: As a warm up we assume that $f,g$ vanish on the $(Ax=b)$ hyperplane.
Then have $r(x) = f(proj_+(x)) + g(proj_-(x))$ where $proj_-$ and $proj_+$ are the half space projections. This is a composition of representable functions.
In case $f,g$ do not vanish on the hyperplane we can decompose them as $f = f_0 + rest$ and $g = g_0 + rest$ where $rest(x) := f(proj_-(proj_+(x)))$. Now $f_0$ and $g_0$ vanish and we have
$r(x) = f_0(x) + g_0(x) + rest(x)$.
To see that every piecewise linear function can be build from these operations, let $f$ be such a function. Then there exists a finite collection of hyperplanes such that
the domain is covered by halfspace intersections on which $f$ is affine.
I think we can use induction on the glue operation rule to represent $f$. But I have trouble spelling this out rigorously. It is a gap in the proof.