A proof from the book (Erdős used this to mean "an elegant -- even divine -- proof", here I prosaically mean "Elements of Green's functions and propagation - potentials, diffusion, and waves" (OUP, 1989), Gabriel Barton, §10.3 p.237 forward)
Let $\Omega$ be a bounded open subset of $\mathbb{R}^n$ with boundary $\partial \Omega$. Barton defines the propagator $K(\mathbf{x},t, \mathbf{y},s)$ as a solution of the following initial value problem (IVP)
$$\left\lbrace \begin{aligned}
\square_{\mathbf{x},t}\, K(\mathbf{x},t, \mathbf{y},s) &=0\ , \quad \forall \ \mathbf{x}, \mathbf{y}\in \Omega,\ \forall\ t,s \in \mathbb{R} \\
K(\mathbf{x},t, \mathbf{y},t) &= 0\ , \quad \forall \ \mathbf{x}, \mathbf{y}\in \Omega,\ \forall\ t \in \mathbb{R} \\
\lim_{t\to s^+}\frac{\partial}{\partial t} K(\mathbf{x},t, \mathbf{y},s) &= \delta^{(3)}(\mathbf{x}- \mathbf{y})\ , \quad \forall \ s \in \mathbb{R}
\end{aligned} \right. \tag{P}$$
(this is what the OP refers to as II)
Remarks:
- in view of the the last condition, one should understand everything in the sense of distribution (and thus the $\forall\ \mathbf{x}, \mathbf{y}, s$... do not make sense). I did not revise my PDE lectures, but it could be that the weak solutions are in fact real differentiable functions. Let us assume it is the case, otherwise one will (in the following) have to make sense of the integral of a distribution (maybe evaluation on some "plateau" test function, constant at 1 on some region, 0 outside?)
- Barton says (at the beginning of chapter 10) that the $\Omega$ unbounded case is treated later.
- the motivation for the name "propagator" is given below.
while the (retarded) Green's function satisfies
$$\left\lbrace \begin{aligned}
\square_{\mathbf{x},t}\, G(\mathbf{x},t, \mathbf{y},s) &=\delta(t-s)\, \delta^{(3)}(\mathbf{x}- \mathbf{y}) \ , \quad \forall \ \mathbf{x}, \mathbf{y}\in \Omega,\ t,s \in \mathbb{R}\\
G(\mathbf{x},t, \mathbf{y},s) &= 0\ , \quad \forall \ \mathbf{x}, \mathbf{y}\in \Omega,\enspace\forall\ s, t \in \mathbb{R},\ t < s
\end{aligned} \right. \tag{G1}$$
which can alternatively be "replaced" by
$$\left\lbrace \begin{aligned}
\square_{\mathbf{x},t}\, G(\mathbf{x},t, \mathbf{y},s) &=\delta(t-s)\, \delta^{(3)}(\mathbf{x}- \mathbf{y}) \ , \quad \forall \ \mathbf{x}, \mathbf{y}\in \Omega,\ t,s \in \mathbb{R}\\
G(\mathbf{x},t, \mathbf{y},t) &= 0\ , \quad \forall \ \mathbf{x}, \mathbf{y}\in \Omega,\enspace t \in \mathbb{R} \\
\lim_{t\to s^+}\frac{\partial}{\partial t} G(\mathbf{x},t, \mathbf{y},s) &= \delta^{(3)}(\mathbf{x}- \mathbf{y})\ , \quad \forall \ \mathbf{x}, \mathbf{y}\in \Omega,\enspace s \in \mathbb{R}
\end{aligned} \right. \tag{G2}$$
Remark: unicity is achieved by imposing boundary conditions (BC), cf. PDE (AMS, 2010), Lawrence C. Evans, Thm 5 p.82 or Barton, § 10.2. The propagator and the retarded Green's function are required to satisfy the same BC, either
$$\forall\ s,t \in \mathbb{R},\ \mathbf{y}\in \Omega\qquad G(\mathbf{x},t, \mathbf{y},s) = 0,\enspace \forall\ \mathbf{x}\in \partial\Omega\quad \text{ or }\quad \frac{\partial}{\partial \mathbf{n}} G(\mathbf{x},t, \mathbf{y},s) = 0,\enspace \forall\ \mathbf{x}\in \partial\Omega$$
(I guess $\frac{\partial}{\partial \mathbf{n}}$ is the directional derivative with $\mathbf{n}$ the normal to the boundary)
The implication (G1) $\Rightarrow$ (G2) follows the same line as the OP's answer (altough the conditions are not the same), or Barton p. 241. He does not talk about the converse implication. Now, here is the main statement
Proposition: If $K$ is a propagator, then
$$\forall \ \mathbf{x}, \mathbf{y}\in \Omega,\ t,s \in \mathbb{R},\quad G(\mathbf{x},t, \mathbf{y},s):= \theta(t-s)\, K(\mathbf{x},t, \mathbf{y},s) $$
is a retarded Green's function, i.e. (P)$\Rightarrow$ (G1). ($\theta$ is Heaviside's function)
Proof: The boundary conditions are the same and the $\theta(t-s)$ factor ensures that $G(\mathbf{x},t, \mathbf{y},s)$ vanishes as soon as $t < s$. It remains to compute
$$ \begin{split}
\square_{\mathbf{x},t}\, \theta(t-s)\, K(\mathbf{x},t, \mathbf{y},s) &= \frac{\partial}{\partial t} \Big( \frac{\partial}{\partial t} \theta (t-s)\, K(\mathbf{x},t, \mathbf{y},s) + \theta (t-s)\, \frac{\partial}{\partial t}K(\mathbf{x},t, \mathbf{y},s)\Big) - \theta (t-s) \, \Delta_{\mathbf{x}} K(\mathbf{x},t, \mathbf{y},s)\\
&= \frac{\partial}{\partial t} \Big( \underbrace{\delta (t-s)\, K(\mathbf{x},t, \mathbf{y},s)}_{=0} + \theta (t-s)\, \frac{\partial}{\partial t}K(\mathbf{x},t, \mathbf{y},s)\Big) - \theta (t-s) \, \Delta_{\mathbf{x}} K(\mathbf{x},t, \mathbf{y},s)\\
&= \frac{\partial}{\partial t} \theta (t-s) \, \frac{\partial}{\partial t}K(\mathbf{x},t, \mathbf{y},s) + \theta (t-s) \, \underbrace{\square_{\mathbf{x},t}\, K(\mathbf{x},t, \mathbf{y},s)}_{=0}\\
&= \delta(t-s)\, \delta^{(3)}(\mathbf{x}- \mathbf{y})
\end{split}$$
(This calculation can also be found in e.g. "An intro to QFT", Peskin & Schroeder, (2.56) p.30)
Motivation for/Use of - the propagator and Green's function:
If an initial condition $v(\mathbf{y})$ at time $s=0$ is given then
$$\forall\ t > s,\quad \psi(\mathbf{x},t) := \int_{\mathbb{R}^n} K(\mathbf{x},t, \mathbf{y},0)\, v(\mathbf{y})\, d\mathbf{y}$$
is a solution of the homogeneous wave equation. If one wants to justify it rigorously, one should try to dominate the different derivatives of the integrand with a positive integrable function independant of $\mathbf{x}, t$ and conclude with Lebesgue's dominated convergence theorem. (If $K$ is a distribution and is singular at a point where we want to take the derivative then I don't know at the moment, cf. however "Eléments de distributions et d'équations aux dérivées partielles", C. Zuily, Lemma 6.1 p.31). The initial condition on $K$ are such that
$$\forall\ \mathbf{x}\in \Omega,\qquad \psi(\mathbf{x},t)\underset{t\to 0^+}{\longrightarrow} 0\qquad \text{and}\qquad \frac{\partial}{\partial t} \psi(\mathbf{x},t)\underset{t\to 0^+}{\longrightarrow} v(\mathbf{x}) $$
Similarly if a propagator $H$ satisfies
$$\left\lbrace \begin{aligned}
\square_{\mathbf{x},t}\, H(\mathbf{x},t, \mathbf{y},s) &=0\ , \quad \forall \ \mathbf{x}, \mathbf{y}\in \Omega,\ \forall\ t,s \in \mathbb{R} \\
H(\mathbf{x},t, \mathbf{y},t) &= \delta^{(3)}(\mathbf{x}- \mathbf{y})\ , \quad \forall \ \mathbf{x}, \mathbf{y}\in \Omega,\ \forall\ t \in \mathbb{R} \\
\lim_{t\to s^+}\frac{\partial}{\partial t} K(\mathbf{x},t, \mathbf{y},s) &= 0\ , \quad \forall \ s \in \mathbb{R}
\end{aligned} \right. \tag{P2}$$
then $\forall\ t > s,\enspace \psi(\mathbf{x},t) := \int_{\mathbb{R}^n} K(\mathbf{x},t, \mathbf{y},0)\, u(\mathbf{y})\, d\mathbf{y}$
satisfies
$$\forall\ \mathbf{x}\in \Omega,\qquad \psi(\mathbf{x},t)\underset{t\to 0^+}{\longrightarrow} u(\mathbf{x})\qquad \text{and}\qquad \frac{\partial}{\partial t} \psi(\mathbf{x},t)\underset{t\to 0^+}{\longrightarrow} 0 $$
This is known as the (weak) Huygen's principle, i.e. one can obtain the amplitude of a wave at a later time by "summing" the amplitude of sources on a wave front set (in the physical sense) at a previous time. (cf. also the comment from Lawrence C. Evans p.80).
Green's functions are used to solve the inhomogeneous equation
$$ \square\, \psi(\mathbf{x},t) = f(\mathbf{x},t) $$
A solution with support in the "future" of the support of $f$ is given by
$$ \psi(\mathbf{x},t) := \int_{\mathbb{R}} \int_{\mathbb{R}^n} G(\mathbf{x},t, \mathbf{y},s)\, f(\mathbf{y},s)\, d\mathbf{y}\, ds$$
Comment:
- The fact that the inhomogeneous problem is related to the homogeneous one is puzzling as the propagator $K$ sends initial conditions to the kernel of the differential operator $\square$ while the operation $\varphi \mapsto G \ast \varphi$ is rather understood as the inverse of $\square$.
- Trying to reason very rudimentarily, integration is the "inverse" of the derivative, so it seems natural that an integral transform is the "inverse" of some differential equation. Although not all operators have a kernel (unless maybe one allows a distributional kernels...), there always exists a Green's function for linear pde with constant coefficients, Malgrange-Ehrenpreis theorem.
- After a little search on internet, using homogeneous solutions to get inhomogeneous ones is well-known and goes by the name of Duhamel's principle but I cannot find a procedure for the converse. Of course, one can look for different Green's functions (i.e. with different boundary conditions) and simply substract them!!!!