Tools: Borel-Cantelli lemma, Tschebysheff inequality, dominated convergence theorem
The first step is to show the weak law of large numbers.
Theorem 1 Let $(Y_n)_{n \in \mathbb{N}}$ be a sequence of independent identically distributed random variables and suppose that $\mathbb{E}(|Y_1|)<\infty$. Then $$S_n := \frac{1}{n} \sum_{i=1}^n Y_i$$ converges in probability to $\mu := \mathbb{E}(Y_1)$ as $n \to \infty$.
For the proof we use the following auxiliary statement.
Lemma 1 Let $(Y_n)_{n \in \mathbb{N}}$ and $(Z_n)_{n \in \mathbb{N}}$ be two sequences of random variables. If $\sum_{n \geq 1} \mathbb{P}(Y_n \neq Z_n)<\infty$, then $\sum_{n \geq 1} (Y_n-Z_n)$ converges almost surely.
Proof of Lemma 1: It follows from the Borel Cantelli lemma that $\mathbb{P}(Z_n \neq Y_n$ infinitely often$)=0$, and therefore there exists a null set $N$ such that for any $\omega \notin N$, we have $Z_n(\omega) = Y_n(\omega)$ for $n$ sufficiently large. Obviously, this implies the (almost sure) convergence of the series.
Proof of Theorem 1: Recall that $\mathbb{E}(|Y_1|)$ implies that $\sum_{n \geq 1} \mathbb{P}(|Y_1|>n)<\infty$. If we define truncated random variables $Z_n := Y_n 1_{\{|Y_n| \leq n\}}$, then $$\sum_{n \geq 1} \mathbb{P}(Z_n \neq Y_n) = \sum_{n \geq 1} \mathbb{P}(|Y_n|>n) = \sum_{n \geq 1} \mathbb{P}(|Y_1|>n)<\infty.$$ Applying Lemma 1, we find that $S_n = n^{-1} \sum_{i=1}^n Y_i$ converges in probability to $\mu$ if (, and only if,) $T_n := n^{-1} \sum_{i=1}^n Z_i$ converges in probability to $\mu$. To show that $T_n \stackrel{\mathbb{P}}{\to} \mu$, we first calculate the variance of $T_n$. Since the random variables $Z_n$, $n \in \mathbb{N}$, are independent and bounded, we have
$$\text{var}(T_n) = \frac{1}{n^2} \sum_{i=1}^n \text{var}(Z_i) \leq \frac{1}{n^2} \sum_{i=1}^n \mathbb{E}(Z_i^2) \tag{1} $$
where we have used that
$$\text{var}(Z_i) = \mathbb{E}(Z_i^2)-[\mathbb{E}(Z_i)]^2 \leq \mathbb{E}(Z_i^2).$$
Now choose some sequence $(a_n)_{n \in \mathbb{N}} \subseteq \mathbb{N}$ such that $a_n \to \infty$ and $a_n/n \to 0$ as $n \to \infty$. Using that $|Z_i| \leq i$, we find
$$\begin{align*} \sum_{i=1}^n \mathbb{E}(Z_i^2) &= \sum_{i=1}^{a_n} \mathbb{E}(Z_i^2) + \sum_{i=a_{n}+1}^n \mathbb{E}(Z_i^2) \\ &\leq a_n \sum_{i=1}^{a_n} \mathbb{E}(|Z_i|) + \sum_{i=a_n+1}^n \mathbb{E}(Z_i^2 1_{|Z_i| \leq a_n}) + \sum_{i=a_n+1}^n \mathbb{E}(Z_i^2 1_{|Z_i| > a_n}) \\ &\leq a_n \sum_{i=1}^{a_n} \mathbb{E}(|Z_i|) + a_n \sum_{i=a_n+1}^n\mathbb{E}(|Z_i|) + n \sum_{i=a_n+1}^n \mathbb{E}(|Z_i| 1_{|Z_i|>a_n})\\ &\leq a_n \sum_{i=1}^n \mathbb{E}(|Y_i|) + n \sum_{i=a_{n}+1}^n \mathbb{E}(|Y_i| 1_{|Y_i|>a_n}) \\ &\leq a_n n \mathbb{E}(|Y_1|) +n^2 \mathbb{E}(|Y_1| 1_{|Y_1|>a_n}). \end{align*}$$
Since $Y_1 \in L^1$, $a_n/n \to 0$ and $a_n \to \infty$, the dominated convergence theorem gives
$$\lim_{n \to \infty} \frac{1}{n^2} \text{var}(T_n) \stackrel{(1)}{\leq} \lim_{n \to \infty} \frac{1}{n^2} \sum_{i=1}^n \mathbb{E}(Z_i^2) =0.$$
Applying Tschebysheff's inequality, it follows easily that $T_n$ converges in probability to $\mu$. This finishes the proof.
Theorem 2: Let $(X_n)_{n \in \mathbb{N}}$ be a sequence of independent identically distributed random variables with $\mathbb{E}(X_n) = \mu$ and $\text{var}(X_n) = \sigma^2<\infty$. Set $\bar{X}_n = n^{-1} \sum_{i=1}^n X_i$ and $S_n := n^{-1} \sum_{i=1}^n (X_i-\bar{X}_n)^2$. Then $S_n \to \sigma^2$ in probability.
Proof: Since $$(X_i-\mu+(\mu-\bar{X}))^2 = (X_i-\mu)^2 + 2 (X_i-\mu) \cdot (\mu-\bar{X})+(\mu-\bar{X})^2 $$
we have $$\begin{align} S_n= \frac{1}{n} \sum_{i=1}^n (X_i-\bar{X})^2 &= \frac{1}{n} \sum_{i=1}^n (X_i-\mu)^2 + 2 (\mu-\bar{X}) \underbrace{\frac{1}{n} \cdot \sum_{i=1}^n (X_i-\mu)}_{\left(\frac{1}{n} \sum_{i=1}^n X_i\right)-\mu =(\bar{X}-\mu)} + (\mu-\bar{X})^2 \\ &= \frac{1}{n} \sum_{i=1}^n (X_i-\mu)^2 - (\bar{X}-\mu)^2. \end{align}$$
By the weak law of large numbers, Theorem 1, we have
$$\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i \to \mu \quad \text{in probability} \\ \frac{1}{n} \sum_{i=1}^n (X_i-\mu)^2 \to \mathbb{E} \left( (X_i-\mu)^2 \right)=\sigma^2 \quad \text{in probability}. $$
Hence $S_n \to \sigma^2$ in probability.
Remark: The idea for this proof is taken from Chung's book A Course in Probability Theory.