The most natural motivation that I have seen for the companion matrix is that given in Dummit and Foote's Abstract Algebra. I'll try to present it here so that it can be understood with minimal exposure to abstract algebra.
Given a polynomial of the form
$$
p(t) = t^n + a_{n-1}t^{n-1} + \cdots + a_1 t + a_0,
$$
and a field $\Bbb F$ of scalars, we can define the following $\Bbb F$-algebra associated with this polynomial. We define $V$ to be the quotient ring $V = \Bbb F[t]/\langle p(t)\rangle$. That is, $V$ consists of formal polynomials over $t$ subject to the rule that $p(t) = 0$ (as a classic example, $\Bbb R[t]/\langle t^2 + 1\rangle$ is one way of presenting the complex numbers). By polynomial division, we can see that this vector space has dimension $n$. In particular, every polynomial in this vector space can be written in the form $b_{n-1}t^{n-1} + \cdots + b_1 t + b_0$, which means that the set $\mathcal B = \{1,t,\dots,t^{n-1}\}$ forms a basis of this vector space. We now consider the linear map $\mu:V \to V$ given by $\mu(p(t)) = tp(t)$.
What's interesting about this linear map is that it has the desired minimal and characteristic polynomial. For any polynomial $q$, we see that
$$
q(\mu)(p(t)) = q(t)p(t).
$$
For any polynomial $q$ of degree $n-1$ or smaller, we have $q(\mu)(1) \neq 0$. On the other hand, the rule that $p(t) = 0$ means that $p(\mu)$ multiplies polynomials by $p(t)$, which is equal to $0$, and is thus the zero map. Because the minimal polynomial divides the characteristic polynomial and $\mu$ is an operator on a dimension $n$ vector space, $p$ must also be the characteristic polynomial.
With all that said, we want to find the matrix of $\mu$ with respect to $\mathcal B$. It's easy to determine the first $n-1$ columns: for each $i = 0,\dots,n-2$, we have
$$
\mu(t^i) = t^{i+1} \implies [\mu]_{\mathcal B} =
\pmatrix{0 & \cdots & 0 & ? \\
1 & \cdots & 0 & ? \\
\vdots & \ddots & \vdots & \vdots \\
0 & \cdots & 1 & ?}.
$$
However, $\mu(t^{n-1}) = t^{n}$ does not appear in our list of basis vectors. In order to express $t^n$ as a linear combination of our basis, we use the fact that $p(t) = 0$ to get
$$
t^n + a_{n-1}t^{n-1} + \cdots + a_1 t + a_0 = 0 \implies\\
t^n = -a_0 - a_1t - \cdots - a_{n-1}t^{n-1}.
$$
With that, we can see that the matrix of $\mu$ relative to our basis is given by
$$
C_p = [\mu]_B = \pmatrix{0 & \cdots & 0 & -a_0 \\
1 & \cdots & 0 & -a_1 \\
\vdots & \ddots & \vdots & \vdots \\
0 & \cdots & 1 & -a_{n-1}}.
$$
So, just like our linear operator $\mu$, the companion matrix $C_p$ will have the desired minimal and characteristic polynomials.
If this approach is hard to follow, a similar construction using "cyclic vectors" is presented in Hoffmann and Kunze's Linear Algebra.
One way to ensure that $A$ has the desired minimal polynomial is to assert that there is an associated cyclic vector. That is, there is a vector $v_0 \in \Bbb F^n$ such that the vectors $\{v_0,Av_0,A^2 v_0,\dots\}$ span $\Bbb F^n$ (as it turns out, every operator over $\Bbb F^n$ with minimal polynomial $p$ has a cyclic vector). As a consequence of the fact that $v_0$ is a cyclic vector, we see that every vector $v \in \Bbb F^n$ can be written in the form
$$
v = b_0 v_0 + b_1 A v_0 + \cdots + b_d A^d v_0
= (b_0 I + b_1 A + \cdots + b_d A^d) v_0
= q(A) v_0,
$$
where $q(t) = b_0 + b_1t + \cdots + b_d t^d$. Consequently, it must be the case that the vectors $v_0,Av_0,\dots,A^{n-1}v_0$ are linearly independent (this fact is a bit trickier than it seems; if there's interest I can provide a proof).
Now, we consider the matrix of $A$ relative to the basis $\mathcal B = \{v_0,Av_0,\dots,A^{n-1}v_0\}$. It is easy to see that
$$
A(A^iv_0) = A^{i+1}v_0 \quad i = 0,\dots,n-2,
$$
and that
$$
p(A)v_0 = 0 \implies A^n v_0 = -a_0 v_0 - a_1 Av_0 - \cdots - a_{n-1}A^{n-1}v.
$$
With that, we conclude that the matrix of $A$ relative to $\mathcal B$ is, once again, the companion matrix of $p$.