1

I am an undergraduate mathematics student currently studying probability theory amongst other things. First off I'd like to note that I am not used to mathematics in English, therefore I apologize if I happen to use any wrong technical terms.

But now on to my question - I am trying to deepen my understanding of integration because in my almost two years of studying maths now I often come across the same issue: I don't really understand integrals. This is a bit embarrassing since it is one of the most basic concepts and I also heard measure theory last semester. However, it is just something that I just can't seem to understand intuitively, I'll try to explain my frustration as clearly as I can:

What you learn in school, is that the integral gives the area under a function/curve. This is quite clear, intuitively and formally. But starting from there, I never really managed to evolve my understanding of integrals; I know that it is more than just "an area under a curve" but I struggle with grasping what it really means. For example, as I mentioned, I am studying probability theory and one of the first things you learn is about the definition of a random variable and its' expected value, which is given by the integral over omega in dP. But why? I just can't manage to get a clear mind as to why it makes sense. I know it's probably a bit of a dumb question, but I just don't fully get it. I realize this isn't a perfectly phrased question, but maybe some of you understand what I'm trying to ask.

Thank you all in advance!! :)

J. W. Tanner
  • 63,683
  • 4
  • 43
  • 88
  • 2
    The "area under the curve" is really not accurate - you have to explain why there are negative areas. The base way I find to understand it is to think of the integral of velocity yielding total distance traveled. – Thomas Andrews May 29 '25 at 15:34
  • Roughly speaking, an integral is a sum of very many very small things. (Indeed, the symbol $\int$ stands for summa, which is Latin for "sum".) These kinds of sums come up in many places – not just in finding areas. For example, to compute the arc length of a curve, you break up the curve into many tiny polygonal segments, and sum up the lengths of the segments. – Joe May 29 '25 at 15:45
  • 2
    The integral is a thing which is meant to generalize the idea of “summing many things”. That’s it. It doesn’t matter if it’s Lebesgue integrals or Riemann integrals or something else. The interpretation via (signed) areas is just one possibility. See here for more examples; the interpretation in terms of probability is just that: another interpretation, but the underlying concept is that of adding up many things. – peek-a-boo May 29 '25 at 15:50
  • A completely separate issue is which theory of integration to use. For a discussion of Lebesgue vs Riemann (actually a description of why Lebesgue integrals are “superior”), see this answer – peek-a-boo May 29 '25 at 15:51
  • It’s tempting to answer but this isn’t a good question I’m afraid. You might be able to focus it better but as it is it just a bit open ended. – SBK May 29 '25 at 16:13

4 Answers4

2

Area under a curve might not be the best way to understand intuitively especially since you can generalize it to find volumes and such. Even in a first course, the area under a curve is used to find length of a curve. How does that work? Technically it's still areas under a curve even then, but it’s a fair bit more. Then in Quantum Mechanics for example, the meaning of that area is the probability of finding a particle at a given position. Elsewhere you use integration to find the expectation value of the momentum.

Intuitively, I'd think of the definite integral most generally as a sum of infinitesimal values. Integration is breaking down a whole problem into tiny manageable pieces, applying some procedure to those pieces, and then adding up the many results. So integrate velocity with respect to time and you get distance traveled. Time times velocity does not give you units of area, yet there is a strong connection between the area under a velocity curve and distance traveled.

In physics, you can integrate over a volume to calculate a magnetic field. So to get a real feel for what integration means, you probably need to apply it in multiple contexts. Line integrals, surface integrals, calculation of moments of inertia.

That's all without getting into measure theory, which introduces a different meaning.

J. W. Tanner
  • 63,683
  • 4
  • 43
  • 88
TurlocTheRed
  • 6,458
2

Ratios of definite integrals come up when we see averages of continuous functions. To average a function $f(x)$ from $x=a$ to $x=b$, we commonly render

$\overline{f}(a,b)=\dfrac1{b-a}\int_a^b f(x) dx,$

but this may also be rendered as

$\overline{f}(a,b)=\dfrac{\int_a^b f(x) dx}{\int_a^b dx}.$

The advantage here is that we can generalize this to a weighted mean thusly:

$\overline{f}(a,b;w)=\dfrac{\int_a^b f(x)w(x) dx}{\int_a^b w(x) dx}.$

J. W. Tanner
  • 63,683
  • 4
  • 43
  • 88
Oscar Lanzi
  • 48,208
1

My go to interpretation of integration is as a signed, weighted, n-dimensional volume. Of course, measures are the generalization of volume functions on arbitrary spaces, but it turns out that there's a deep relationship between measures and integration.

If you have a measure $\mu$, then $\mu(A)$ defines the integration of $1_X$, the function that's identically $1$ on $X$. That is, $$\int_A 1_X:=\mu(A)$$ defines a proper integral. Then, this integral can be extended to other functions, first to simple functions, then by continuity to other kinds of positive functions where they are dense, and then to arbitrarily signed functions. On the other hand, starting with a notion of integration also defines a measure by exploiting the relationship above.

In the case of integrals on the real line, we often choose the Lebesgue measure, $dx$. This is a measure defined on the sigma algebra generated by euclidean open sets, and so it is defined by $dx(\left]a,b\right[)=b-a$. So on the real line, we take as our notion of volume the "length of a subset". Then integration is simply the notion induced by setting $$\int_a^b 1\cdot dx=b-a$$ and then extending it all other (integrable) functions. So, when you integrate such a function, the expression $$\int_a^b f(x)dx$$ means you're measuring the interval $(a,b)$, but its points have different weights. For example,integrating a gaussian function concentrates almost all weight around the origin, because the function is very close to zero away from it. You can interpret it this way because $$\int_a^b f(x)dx=\int_a^b 1\cdot f(x)dx:=\mu_f(\left]a,b \right[)$$

For the new measure $f(x)dx$. This is a weighted measure; it doesn't simply give you the length, but the weighted length where some points "matter", or contribute, more than others. So an integral of a function is also a way of quantifying how much the function is concentrated about an interval.

It is, therefore, a weighted, signed, length.

Lourenco Entrudo
  • 2,144
  • 1
  • 8
  • 24
0

So if you want a more entry-level answer rather than one coming from an experienced mathematician, a different intuition for integration comes from viewing the continuous calculus as an extension of a discrete calculus.

Consider infinite sequences $A_\bullet, B_\bullet$, then we can define things like term-by-term sum and product, and multiplying by a scalar, $$ \begin{align} C = k A &~~\leftrightarrow~~ C_i = k A_i,\\ D = A + B &~~\leftrightarrow~~ D_i = A_i + B_i,\\ E = A \cdot B &~~\leftrightarrow~~ E_i = A_i B_i. \end{align} $$ We can also define shifting them left or right (with an implicit zero added), $$ \begin{align} ({\downarrow}A)_i &= A_{i+1}, \\ ({\uparrow}A)_i &= \begin{cases} 0, & \text{ if } i = 0,\\ A_{i-1} & \text{otherwise.}\end{cases} \end{align} $$ Control question: is ${\downarrow}{\uparrow} A = A$? what about ${\uparrow}{\downarrow}A = A$?

Now this allows us to define the term-by-term difference that losslessly encapsulates the original sequence, $$ \Delta A = A - {\uparrow} A\\ (\Delta A)_i = \begin{cases} A_0, & \text{ if } i = 0,\\ A_i - A_{i-1} & \text{otherwise.}\end{cases} $$ If you think about how to undo this, you would do it incrementally: start with $A_0 = \Delta A_0,$ then you would form $A_1 = A_0 + \Delta A_1$, then you would form $A_2 = A_1 + \Delta A_2,$ and so on. So this invites the inverse operator, $$ (\Sigma A)_i = A_0 + \dots + A_i. $$ And again these are inverse, $\Sigma \Delta X = \Delta \Sigma X = X.$ (This is kind of obvious for $\Delta\Sigma X$ based on the above definition, the $\Sigma \Delta$ case again requires recursion or some random gesturing at "telescoping series" or so.)

So then these actually commute with $\uparrow$ too, so $\Sigma{\uparrow}A = {\uparrow}\Sigma A,~\Delta{\uparrow}A = {\uparrow}\Delta A,$ but you can kind of see that if you try to $\Sigma{\downarrow}A$ then you have to remove $A_0$ from the whole series, for this it helps to define the multiplicative identity $I = (1, 1, 1, \dots)$ and $\Sigma{\downarrow}A = {\downarrow}\Sigma A - A_0 I $. Or sometimes it's helpful to have an indicator term like $\iota(n) = {\uparrow}^n \Delta I$, so it has a 1 right at the $n^\text{th}$ position but otherwise 0.

And at this point one would do a bunch of examples, like that if $N = (0 1, 2, \dots)$ then $\Delta N = {\uparrow}I$, or that $$ \Delta(A \cdot B) = A \cdot \Delta B + \Delta A \cdot B - \Delta A \cdot \Delta B,\\ \Delta(N \cdot N) = 2 N - {\uparrow}I, $$ which gives from our inverse that the sum of the first N counting numbers is $N^2$ or if you prefer that $\Sigma N = \frac12 N \cdot (N + 1).$

Well, that's a discrete version of calculus. When we make it continuous, the idea is that we form a term-by-term $X = \epsilon N$ and $f(X) = (f(0), f(\epsilon), f(2\epsilon), \dots)$ for some small $\epsilon$, sampling the function from 0 to infinity. For continuous functions, this has the effect of making the series vary extremely slowly, $f_{i+1} \approx f_i,$ and this means ${\downarrow} A \approx {\uparrow} A \approx A$ while $\Delta A \approx 0$ and $\Sigma A \approx A_0 N$ at least to start. Those aren't rigorous, but just to get your brain kind of in the mood for the steps that we would then take:

  • We don't use just $\Delta$ but $\mathrm d = \epsilon^{-1} \Delta$ for our differences

  • Dividing our Leibniz rule we find $\mathrm d (A \cdot B) = A \cdot \mathrm d B + \mathrm d A \cdot B + \epsilon~\mathrm d A \cdot \mathrm d B$ and for continuous functions that $\epsilon$ premultiplier nukes the third term... this also implies that $\mathrm d (X^n) = n X^{n-1} + \epsilon E$ for some error term E.

  • Th blows up to infinity when $f$ is not continuous at some argument $x$, consider defining $\delta(x) = \epsilon^{-1} ~\iota(\lceil x/\epsilon\rceil)$ as the Dirac delta sequence, to have an explicit representation of these terms in the continuous version of the discrete calculus, this also handily sidesteps the fact that it's not a function without needing to introduce to kids the notion of a functional.

  • The inverse of $\mathrm d$ is now $\int = \epsilon ~\Sigma$. So this gives you Riemann sums directly.

And then for example of how the same thinking is being mirrored here, if you want to calculate the volume of a square pyramid, and you think of like an actual pyramid in Egypt, you would say in the discrete calculus, "well, the pyramid is made of layers and each layer is made of a certain number of bricks of volume $\epsilon$ and the bricks in each layer scale like $k N^2$, so we clearly want to do some sort of $\Sigma N^2$ operation," whereas in the continuous calculus you can kind of fudge out the error term to simultaneously get something easier and more precise, "the slope of the pyramid is some $\tan\theta$, the pyramid has height $h$ and base area $A$, so the volume of one layer of pyramid of height $\epsilon$ at distance $y$ down from the tip of the pyramid, is $\epsilon ~A~(y/h)^2.$ Applying $\Sigma$, hey, we get an $\int$, so this is $$V = \frac{A}{h^2} \left[ \int X^2 \right ]_{\lceil h/\epsilon\rceil} = \frac{1}{3} ~A~ h + \epsilon E,$$ for some error term $E$.

Viewed this way the basic idea of integration is just "Cut something up into little slices that I can sum together, and I have helpful rules for how to do these sums really quickly." The area under the curve just happens to be easy to slice vertically into little rectangles of height $f(X)$ and width $\epsilon$.

CR Drost
  • 3,027