Since you are interested in visualizations (sorry, I have no reference recommendation), I'll try to explain it that way. There is something called the eigenvalue decomposition of a matrix, and it is very much related to the Schur decomposition. Since complex vectors are hard to imagine, I will go here for the real Schur decomposition which tells us that every real matrix $A$ can be decomposed as
$$A = Q T Q^T,$$
where $Q$ is orthogonal and $T$ is quasitriangular (block triangular with the diagonal blocks of order $1$ and $2$).
Now, if we assume that $A$ is also orthogonal, we can show that $T$ is quasidiagonal, i.e., block diagonal with the diagonal blocks of order $1$ and $2$, and also orthogonal.
Since $Q$ is orthogonal, its columns form an orthonormal basis. In other words, this is very much like the standard coordinate system, except that it's rotated in some directions (spanning vectors are still orthogonal and have the length $1$).
So, in that orthonormal basis, orthogonal operator is just a quasidiagonal matrix
$$T = T_1 \oplus T_2 \oplus \cdots \oplus T_k = \operatorname{diag}(T_1, T_2, \dots, T_k),$$
and its diagonal elements can be:
- $T_i = 1$, which is a null-reflector (does nothing);
- $T_i = -1$, which is a simple reflectors along the corresponding coordinate vector (column of $Q$);
a matrix of order $2$ with complex conjugate eigenvalues of modulus $1$, i.e.,
$$T_i = \begin{bmatrix} a & b \\ -b & a \end{bmatrix} \quad \text{or} \quad T_i = \begin{bmatrix} a & b \\ b & -a \end{bmatrix},$$
$|\det T_i| = 1$. The first of these two is a rotation, and the second one is a rotation composed with a reflection along the second of two corresponding coordinate axes.
So, basically, orthogonal matrix is just a combination of one-dimensional reflectors and rotations written in appropriately chosen orthonormal basis (the coordinate system you're used to, but possibly rotated).
Fun fact: All orthogonal matrices (even rotations) of order $n$ can be presented as compositions of at most $n$ reflectors. This is called Cartan–Dieudonné theorem and works for fields more general than $\mathbb{R}$ and all non-degenerate symmetric bilinear forms (read: generalizations of standard scalar product).
Also I heard Convex Optimization – Boyd and Vandenberghe is pretty good too."*
– peterh Feb 03 '18 at 01:18