There is something going on in the middle, before you get to finite elements, where you are merely studying the theory of PDE itself. The equation $L(u)=s$ is called the strong formulation of the PDE. It satisfies the differential equation in the sense that the derivatives needed to define $L$ exist everywhere and are equal to $s$ everywhere. In PDE theory we often look for weak solutions instead. These are solutions in the sense of distribution, meaning that they satisfy
$$\int_D L(u) v dx = \int_D s v dx$$
for all $v$ in some "large" space $V$ of "test functions". (Roughly speaking, as far as we can tell by integration against functions in $V$, $Lu-s=0$.) The standard $V$ in PDE theory is the space of smooth compactly supported functions, but some generalization is desirable for finite elements, since the smoothness requirement is inconvenient in finite elements.
Commonly, $L$ itself is only understood in the sense of distribution in the equation, which in practice means that it does not appear in the equation as is. Instead there is a "formal integration by parts" step which reduces the number of derivatives which are required for a function to be a candidate. For instance, we say that $u$ is a weak solution to $-\Delta u = f$ with $u=0$ on the boundary if $u \in H^1_0(D)$ and
$$\int_D \nabla u \cdot \nabla v dx = \int_D fv dx$$
for all $v \in H^1_0(D)$. (If this notation is new to you, this is the Sobolev space $W^{1,2}_0(D)$, consisting of functions on $D$ which are square integrable, have one weak derivative which is square integrable, and vanish on the boundary. Evans's PDE book is the standard reference on the subject.) This is formally derived by integrating the original integral equation by parts, thereby moving the derivative away from $u$.
You may be asking why we pose the weak formulation in the first place. This is a long story, with one of the most natural motivations coming from physics rather than mathematics. But if you are interested in this, say as much in the comments and I'll say some things here. For now I'm going to move on to numerical issues.
The problem here for numerics is that you have an infinite number of integral equations to check, and the space you are selecting the solution from is also infinite dimensional. The finite element approach to fixing this is to change both spaces: choose a finite dimensional subspace of your solution space, and a finite dimensional subspace of the test function space with the same dimension. Then you can find the solution by choosing bases for the solution space and the test function space, representing the solution in the first basis, and solving the equations that result when you insert each member of the second basis into the equation.
The new problem is how to choose the subspaces, because we want the finite element solution to converge (at least in some sense) to the true solution as the dimension of the subspaces goes to infinity. This is where different finite element methods differ. The most common choice is piecewise polynomials of low degree, such as continuous piecewise linear functions, or differentiable piecewise quadratic functions. Some advantages to these kinds of methods:
- There is no additional difficulty in defining the problem in the method. Some other methods, such as discontinuous methods, lack this property.
- You have the density property that you need for convergence.
- Through the right choices of bases, the equations can be taken to be "localized", meaning that they only involve a small number of pieces of the domain which are all close to one another. This is good for trying to solve the linear system efficiently.