This question is inspired by this which I saw earlier today. I started writing my answer, to share the insight that the matrix logarithm can be defined on matrices that do not have unit norm using an alternative technique.
Now, Sangchul has posted a great answer explaining how it is that we know the map $X\mapsto\sum_{n=1}^\infty\frac{(-1)^{n-1}}{n}X^n$ defines a logarithm of $1+X$ whenever the sum is convergent, whenever $\|X\|\lt1$. By scaling, we can use this map to obtain arbitrary logarithms since there is an easy definition $\log(\lambda I)=(\log\lambda)I$ which satisfies $\exp(\log(\lambda I))=\lambda I$ trivially, and this matrix will also commute with all other matrices.
I prefer an approach that sidesteps the functional calculus entirely, that I first learnt on Wikipedia over a year ago. The argument proceeds like this, paraphrased by me:
For any invertible $n\times n$ complex matrix $X$, there is a basis of the space $\Bbb C^n$ in which $X=\bigoplus_{m=1}^kJ_m$ a decomposition into Jordan blocks with some associated eigenvalues $\lambda_m$. If we can find a matrix $Y=\bigoplus_{m=1}^nT_m$, where $\exp(T_m)=J_m$ for all $m$, then a simple inspection of the exponential series shows $\exp(Y)=\bigoplus_{m=1}^n\exp(T_m)=\bigoplus_{m=1}^nJ_m=X$, so $Y$ is a logarithm of $X$. Many will exist due to branching concerns.
It remains to find a logarithm of any arbitrary Jordan block. For a block $J$ with eigenvalue $\lambda$, we can write $J=\lambda(I+K)$ where $K$ is the matrix will all zero entries, except for entries $\lambda^{-1}$ on the first superdiagonal ($\lambda\neq0$ by invertibility). If we suppose the formal power series argument is valid, we can say: $$\begin{align}\log(\lambda(I+K))&=\log(\lambda)I+\log(I+K)\\&=\log(\lambda)I+K-\frac{1}{2}K^2+\frac{1}{3}K^3\\&+\cdots+(-1)^j\frac{1}{j-1}K^{j-1}\end{align}$$Since $K$ will be nilpotent of order $j$ if $j$ is the dimension of the Jordan block, the tail terms of the Mercator series vanish.
Any branch of the complex logarithm is appropriate for $(\log(\lambda))I$ - due to commutativity, re-exponentiation gives $\lambda\exp(K-(1/2)K^2+\cdots)\overset{?}{=}\lambda(1+K)$.
Then to claim that this process produces a logarithm for all invertible $X$, it suffices to demonstrate the following:
For all $\lambda\in\Bbb C$ and all $n\times n$ square matrices $K$ of the form: $$K=\begin{pmatrix}0&\lambda&0&0&\cdots\\0&0&\lambda&0&\cdots\\0&0&0&\lambda&\cdots\\\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&\cdots&0\end{pmatrix}$$We have the identity (by commutativity, the two are equivalent): $$\exp\left(\sum_{m=1}^{n-1}\frac{(-1)^{m-1}}{m}K^m\right)=\prod_{m=1}^{n-1}\exp\left(\frac{(-1)^{m-1}}{m}K^m\right)=I_n+K$$
I am looking for an algebraic (or similar) proof of this result. Don't get me wrong - I appreciate indirect proofs, and find them rather magical whenever they arise, but I have never studied any rigorous development of the functional calculus. Analytic functions can be extended to matrix arguments through a variety of methods - matrix Taylor series, Cayley-Hamilton, bizarre Cauchy integral representations, power series convergent in Banach space... and I accept all of these as extensions, with perhaps some convenient properties such as derivative relations carrying over from the complex/real-analytic theory.
However, it seems suspicious that "higher-order" properties should also be preserved in this extension process. Although we can give well-defined and well-motivated analogues of $\exp$ and $\log$ to matrices, I don't see any immediate reason, a priori, to suppose that the extension reflects relations between them such as $\exp\log\equiv\mathrm{Id}$. The main reason for my suspicion is the following observation: analysis tends to work through limiting arguments, and finite sums just won't do - they yield polynomials only. It is then rather odd that an analytic series maintains its "special" properties despite collapsing into a finite polynomial series - $\Bbb C$ has no nilpotent nonzero elements, but the space of matrices certainly does, and is a key ingredient in the above construction of the logarithm.
So what am I looking for? I'm looking for a strong explanation for why what I'm calling "higher-order" properties should carry over in this extension process, especially since there are many different ways to extend analytic functions to matrix arguments: of course, any properties that can be deduced from the power series will carry over, e.g. $\exp(A+B)=\exp(A)\exp(B)$ if $A,B$ commute. However, $\exp\circ\log$'s power series is unclear to me here, since the $\log$ is not actually a power series in this context, really, but a finite polynomial. To reiterate, it is the use of nilpotent elements that concerns me - since this challenges the algebra of $\Bbb C$, I feel it should also challenge the analytic series which hail from $\Bbb C$: at least, it should need some more justification.
An algebraic proof (direct matrix-arithmetic proof, or maybe some clever linear algebra argument) for why this particular nilpotent logarithm should hold, would be greatly appreciated. My thoughts on this matter so far:
$$T:=\sum_{m=1}^{n-1}\frac{(-1)^{m-1}}{m}K^m\\=\begin{pmatrix}0&\lambda&-\frac{1}{2}\lambda^2&\cdots&(-1)^n\frac{1}{n-1}\lambda^{n-1}\\0&0&\lambda&\cdots&(-1)^{n-1}\frac{1}{n-2}\lambda^{n-2}\\0&0&0&\cdots&(-1)^n\frac{1}{n-3}\lambda^{n-3}\\\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&\cdots&0\end{pmatrix}$$An observation of the way that this matrix's elements "shift" up a diagonal every time the matrix is squared, cubed, etc. shows that the main superdiagonal has nonzero entries once and only once in $I+T+\frac{1}{2}T^2+\cdots$, and we can easily partially compute the exponential: $$\exp(T)=\begin{pmatrix}1&\lambda&?&?&\cdots&?\\0&1&\lambda&?&\cdots&?\\0&0&1&\lambda&\cdots&?\\0&0&0&1&\cdots&?\\\vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&0&0&\cdots&1\end{pmatrix}$$So it remains to show that the sum over all the remaining superdiagonals vanishes. Other pertinent point is that $T$ is nilpotent of order $n$, so the exponential series terminates at $\frac{1}{(n-1)!}T^{n-1}$. This re-explains my suspicion, since now we are claiming these two polynomials are inverse, which is false in the complex world that we began in.
How can we fill in the algebraic gaps here? The matrix powers seems quite intractable to symbolically compute.
N.B. Sanchul's answer in the linked post never once needs the invertibility of $X$, as far as I can see, whereas this Wikipedia-based construction does. How do we reconcile the two?