How can I prove $\operatorname{rank}A^TA=\operatorname{rank}A$ for any $A\in M_{m \times n}$?
This is an exercise in my textbook associated with orthogonal projections and Gram-Schmidt process, but I am unsure how they are relevant.
How can I prove $\operatorname{rank}A^TA=\operatorname{rank}A$ for any $A\in M_{m \times n}$?
This is an exercise in my textbook associated with orthogonal projections and Gram-Schmidt process, but I am unsure how they are relevant.
Let $\mathbf{x} \in N(A)$ where $N(A)$ is the null space of $A$.
So, $$\begin{align} A\mathbf{x} &=\mathbf{0} \\\implies A^TA\mathbf{x} &=\mathbf{0} \\\implies \mathbf{x} &\in N(A^TA) \end{align}$$ Hence $N(A) \subseteq N(A^TA)$.
Again let $\mathbf{x} \in N(A^TA)$
So, $$\begin{align} A^TA\mathbf{x} &=\mathbf{0} \\\implies \mathbf{x}^TA^TA\mathbf{x} &=\mathbf{0} \\\implies (A\mathbf{x})^T(A\mathbf{x})&=\mathbf{0} \\\implies A\mathbf{x}&=\mathbf{0}\\\implies \mathbf{x} &\in N(A) \end{align}$$ Hence $N(A^TA) \subseteq N(A)$.
Therefore $$\begin{align} N(A^TA) &= N(A)\\ \implies \dim(N(A^TA)) &= \dim(N(A))\\ \implies \text{rank}(A^TA) &= \text{rank}(A)\end{align}$$
Let $r$ be the rank of $A \in \mathbb{R}^{m \times n}$. We then have the SVD of $A$ as $$A_{m \times n} = U_{m \times r} \Sigma_{r \times r} V^T_{r \times n}$$ This gives $A^TA$ as $$A^TA = V_{n \times r} \Sigma_{r \times r}^2 V^T_{r \times n}$$ which is nothing but the SVD of $A^TA$. From this it is clear that $A^TA$ also has rank $r$. In fact the singular values of $A^TA$ are nothing but the square of the singular values of $A$.
Since elementary operations do not change the rank of a matrix we have $\text{rank}(A^TA) = \text{rank}(E^TA^TAE)$, where $E$ is a multiplication of several elementary operations which make $AE = [A_1, A_2]$, where $A_1$ is a column full rank matrix with $\text{rank}(A_1) = \text{rank}(A)$.
Thus we can find a matrix $P$ such that $A_1P= A_2$ and $AE = [A_1, A_1P] = A_1[I, P]$.
Thus $\text{rank}(E^TA^TAE) = \text{rank}(A_1[I, P])^T(A_1[I, P])$. In this equation, the matrices are all of full rank and the rank equals $\text{rank}(A)$, so on a real space $\text{rank}(A^TA) = \text{rank}(A)$, completing the proof.
The question mentions the Gram-Schmidt process, so here's an answer using it.
Pick an orthonormal basis of $\operatorname{im} B$: $\{B v_1, \dots, B v_n\}$, using the Gram-Schmidt process. We claim $\{B^T B v_1, \dots, B^T B v_n\}$ is a basis of $\operatorname{im} B^T B$. It clearly spans $\operatorname{im} B^T B$, so we just need linear independence.
Suppose $\sum_i a_i B^T B v_i = 0$. Then for any $k$, $0 = \langle \sum_i a_i B^T B v_i, v_k \rangle = \sum_i a_i \langle B^T B v_i, v_k \rangle = \sum_i a_i \langle B v_i, B v_k \rangle = a_k$. Hence $a_k = 0$ for all $k$.
Definition (Range, Image and Rank). The range or image of a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ is a the set of all vectors in $\mathbb{R}^m$ that can be reached by $\mathbf{A}$, \begin{equation} \mathrm{ran}(\mathbf{A}) := \mathrm{im}(\mathbf{A}) := \left\{ \mathbf{y} \in \mathbb{R}^m\,\middle|\,\mathbf{y} = \mathbf{A} \mathbf{x} \text{ for some } \mathbf{x} \in \mathbb{R}^n \right\} \subseteq \mathbb{R}^m. \end{equation} The rank of $\mathbf{A}$ is the dimension of the range, \begin{equation} \mathrm{rank}(\mathbf{A}) := \mathrm{dim}(\mathrm{ran}(\mathbf{A})). \end{equation}
Definition (Kernel, Nullspace and Nullity). The kernel or nullspace of a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ is the set of all point in $\mathbb{R}^n$ that $\mathbf{A}$ maps to zero, \begin{equation} \mathrm{ker}(\mathbf{A}) := \mathrm{null}(\mathbf{A}) := \left\{\mathbf{x} \in \mathbb{R}^n\,\middle|\,\mathbf{A} \mathbf{x} = \mathbf{0}\right\} \subseteq \mathbb{R}^n. \end{equation} The nullity of $\mathbf{A}$ is the dimension of its kernel, \begin{equation} \mathrm{nullity}(\mathbf{A}) := \mathrm{dim}(\mathrm{ker}(\mathbf{A})). \end{equation}
Theorem (Rank-nullity theorem). The rank-nullity theorem states that, for a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$, the sum of the rank and nullity of $\mathbf{A}$ is equal to the dimension of the domain of $\mathbf{A}$, \begin{equation} \mathrm{rank}(\mathbf{A}) + \mathrm{nullity}(\mathbf{A}) = \mathrm{dim}(\mathbb{R}^n) = n. \end{equation}
Proof. (https://en.wikipedia.org/wiki/Rank%E2%80%93nullity_theorem)
Definition (The standard norm on $\mathbb{R}^n$). The standard norm of a vector $\mathbf{x} \in \mathbb{R}^n$ is the scalar \begin{equation} \Vert \mathbf{x} \Vert_2 := \sqrt{\mathbf{x}^\mathrm{T} \mathbf{x}}. \end{equation}
Remark. The standard norm on $\mathbb{R}^n$ is indeed a norm, so in particular, $\Vert \mathbf{x} \Vert_2 = 0 \Leftrightarrow \mathbf{x} = \mathbf{0}$.
Proposition (The rank of $\mathbf{A}$ and $\mathbf{A}^\mathrm{T} \mathbf{A}$). For any matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$, the ranks for $\mathbf{A}$ and $\mathbf{A}^\mathrm{T} \mathbf{A}$ are the same, \begin{equation} \mathrm{rank}(\mathbf{A}) = \mathrm{rank}(\mathbf{A}^\mathrm{T} \mathbf{A}). \qquad \square \end{equation}
Proof. Notice that \begin{equation} \mathbf{x} \in \mathrm{ker}(\mathbf{A}) \Rightarrow \mathbf{A} \mathbf{x} = \mathbf{0} \Rightarrow \mathbf{A}^\mathrm{T} \mathbf{A} \mathbf{x} = \mathbf{0} \Rightarrow \mathbf{x} \in \mathrm{ker}(\mathbf{A}^\mathrm{T} \mathbf{A}) \end{equation} and that \begin{align} &\mathbf{x} \in \mathrm{ker}(\mathbf{A}^\mathrm{T} \mathbf{A}) \Rightarrow \mathbf{A}^\mathrm{T} \mathbf{A} \mathbf{x} = \mathbf{0} \Rightarrow \mathbf{x}^\mathrm{T} \mathbf{A}^\mathrm{T} \mathbf{A} \mathbf{x} = \mathbf{0} \\ &\Rightarrow (\mathbf{A} \mathbf{x})^\mathrm{T} (\mathbf{A} \mathbf{x}) = \mathbf{0} \Rightarrow \Vert \mathbf{A} \mathbf{x} \Vert_2^2 = \mathbf{0} \Rightarrow \mathbf{A} \mathbf{x} = \mathbf{0} \Rightarrow \mathbf{x} \in \mathrm{ker}(\mathbf{A}). \end{align} Hence, \begin{equation} \mathrm{ker}(\mathbf{A}) = \mathrm{ker}(\mathbf{A}^\mathrm{T} \mathbf{A}). \end{equation} Thus, by the rank-nullity theorem, \begin{equation} \mathrm{rank}(\mathbf{A}) = n - \mathrm{nullity}(\mathbf{A}) = n - \mathrm{dim}(\mathrm{ker}(\mathbf{A})) = n - \mathrm{dim}(\mathrm{ker}(\mathbf{A}^\mathrm{T} \mathbf{A})) = n - \mathrm{nullity}(\mathbf{A}^\mathrm{T} \mathbf{A}) = \mathrm{rank}(\mathbf{A}^\mathrm{T} \mathbf{A}). \end{equation} This concludes our proof. $\qquad \square$