If I have an analytic function $f$ of a square matrix A (like sin(A)), then I know that if the matrix diagnosable then it is possible to find a matrix $$D = P^{-1}AP \tag{1}$$. Then for a function $f(A)$: $$f(A) = Pf(D)P^{-1} \tag{2}$$
So for example if $f(x) = cos(x)$ : $$cos(A) = Pf(D)P^{-1} \tag{3}$$
What is the justification for this and how does this follow from first principles? A similar question has been asked here $\sin(A)$, where $A$ is a matrix but I don't see any justification/proof for (2).