There are a lot of questions here, so I may have missed some of them.
$\textbf{Why study convergence on $\mathcal{D}[0,1]$ rather than $L^2[0,1]$?}$
Let me start by mentioning that people do study convergence of stochastic processes on $L^2[0,1]$.
Often, we are interested in convergence of certain observables of a stochastic process, rather than just convergence of the stochastic processes itself. For example, suppose that $X_n \Longrightarrow X$ in $L^2[0,1]$ and we are interested in the supremum of $X$. Is it true that $\sup_{0 \leq t \leq 1} X_n (t) \Longrightarrow \sup_{0 \leq t \leq 1} X(t)?$ Clearly not; the left and right hand sides are not even well-defined in the $L^2[0,1]$ sense (because functions only make sense up to sets of measure zero). If we are interested in studying the convergence of the supremum of a stochastic process, we then need to work in another space.
The key reasons to work with the Skorokhod J1 metric (or, rather, a slight modification of it) on $\mathcal{D}[0,1]$ is that it turns the space of right-continuous functions with left limits on $[0,1]$ into a complete, separable metric space whose Borel sigma algebra is generated by the coordinate projections (Ethier and Kurtz Proposition 3.7.1) in which most (but definitely not all) functionals we are interested in are continuous. For example (Ethier and Kurtz Exercise 3.11.26) the following maps from $\mathcal{D}[0,1]$ to $\mathcal{D}[0,1]$ are all continuous in this topology:
1.) $x \in \mathcal{D}[0,1] \mapsto (t \mapsto \sup_{0 \leq s \leq t} x(s)) \in \mathcal{D}[0,1]$
2.) $x \in \mathcal{D}[0,1] \mapsto (t \mapsto \inf_{0 \leq s \leq t} x(s)) \in \mathcal{D}[0,1]$
3.) $x \in \mathcal{D}[0,1] \mapsto (t \mapsto \int_0^t x(s) ds) \in \mathcal{D}[0,1]$
4.) $x \in \mathcal{D}[0,1] \mapsto (t \mapsto \sup_{0 \leq s \leq t} (x(s) - x(s-)) \in \mathcal{D}[0,1]$
$\textbf{What is the difference between convergence in $\mathcal{D}[0,1]$ and $\mathcal{C}[0,1]$?}$
$\mathcal{D}[0,1]$ is a bigger space than $\mathcal{C}[0,1]$ and sometimes we want to work with processes that have jumps (like the Poisson process).
It is not terribly hard to show, however, that if you equip $\mathcal{C}[0,1]$ with the Skorokhod metric, then you get the usual topology of uniform convergence on $\mathcal{C}[0,1]$. Skorokhod mentions this just before defining the topology in his 1956 paper "Limit theorems for stochastic processes" for example. We can actually say a little bit more.
If you have a sequence of variables $X_n \Longrightarrow X$ in $\mathcal{D}[0,\infty)$, then $X$ is a.s. continuous if and only if $\int_0^\infty e^{-s}(1 \wedge\sup_{0 \leq r \leq s} |X_n(r) - X_n(r-)|)ds \Longrightarrow 0$ (Ethier and Kurtz Theorem 3.10.2). I switched to $\mathcal{D}[0,\infty)$ here to avoid problems at the endpoint of the interval, which are a real pain when working on the Skorokhod space on a finite interval. One can often show that for all $s$, $|X_n(s) - X_n(s-)| \leq C_n$ with $C_n \to 0$, which gives an accessible sufficient condition for convergence to a continuous process. For example, if $N(t)$ is a rate one Poisson process, then in order to show that $N^{(n)}(t) = \frac{1}{\sqrt{n}}(N(n t) - nt)$ converges to Brownian Motion, one might want to use the fact that any limit point of $N^{(n)}$ is continuous, which is immediate from the fact that the jumps of $N^{(n)}$ have size at most $\frac{1}{\sqrt{n}}$.
$\textbf{Example: Integral of the maximum of a random walk.}$
For a concrete example that would be quite hard (maybe not possible?) to do using finite dimensional distributions, let $\{X_i\}_{\{i \geq 0\}}$ be i.i.d. with $E X_i = 0, E X_i^2 = 1$ and set $S_n = \sum_{i=0}^n X_i$. Donsker's theorem on the Skorokhod space says that for $S^{(n)}(t) = \frac{1}{\sqrt{n}} S_{\lfloor nt \rfloor}$, $(t \mapsto S^{(n)}(t) ) \Longrightarrow \left( t \mapsto B(t)\right)$ where $B(t)$ is Brownian motion. Now consider the continuous map $g : \mathcal{D}[0,1] \to \mathcal{D}[0,1]$ given by
\begin{align*}
x \mapsto \left(t \mapsto \int_0^t \max_{0 \leq r \leq s} x(r) ds\right)
\end{align*}
Applying Donsker's theorem and the continuous mapping theorem, we see that
\begin{align*}
\left( t \mapsto \int_0^t \max_{0 \leq r \leq s} \frac{1}{\sqrt{n}} \sum_{i=0}^{\lfloor n r \rfloor} X_i d s\right) = g(S^{(n)}) \Longrightarrow g(B) = \left( t \mapsto \int_0^t \max_{0 \leq r \leq s} B(r) ds \right)
\end{align*}
In particular, then we have
\begin{align*}
\int_0^1 \max_{0 \leq r \leq s} \frac{1}{\sqrt{n}} \sum_{i=0}^{\lfloor n r \rfloor} X_i d s \Longrightarrow \int_0^1 \max_{0 \leq r \leq s}B(s) ds
\end{align*}