Since you said you learnt a lot of math in your undergrad, here is an analogy (just an analogy):
Differential Equations:
$x’’(t)=a x’(t)+bx(t)$
Why are $a,b$ there? What does linking $x’’(t),x’(t),x(t)$ this way mean?
It is the model for a physical dynamical process.
Why do we use the characteristic polynomial below?
$\lambda^2 + a \lambda + b =0 $
Fourier/Laplace theory relates it to the model. Then we can express the solutions as something like
$
x(t)=c_1 e^{\lambda_1 t} + c_2 e^{\lambda_2 t}
$
where $\lambda_i$ are the roots of the polynomial.
In discrete maths either over reals or over finite fields there is a completely analogous theory of difference equations
$x_n= a x_{n-1}+ b x_{n-2}$
and roots of characteristic polynomials are used to represent solutions.
The LFSR implements a difference equation over the binary field $GF(2).$ The solutions as linear combinations of powers of roots of the characteristic polynomial. Plus we build the LFSR pick the coefficients in the binary field.
Edit: For primitive polynomials which give maximum length LFSR sequences, the roots have a special algebraic structure (conjugates) which means you can write and efficiently calculate (say) the 199,999th term of the sequence without computing the previous terms, via the trace representation. You cannot do this otherwise, say with the taps.
Bonus: Everything being finite the solutions are exact which we need in coding, cryptography, etc.