I have two questions concerning the proof of Jensen's inequality in Durrett's "Probability Theory and Examples" [pp.23-24]. In the following there is the proof, with the questions I have along the text.
[Note that in the following I substitute Durrett's notation for continuity from above $h \downarrow 0$, with $h \to 0^+$]
Theorem 1.5.1. Jensen’s inequality.
Suppose $\phi$ is convex, that is, $\lambda \phi (x) + (1 − \lambda) \phi(y) \geq \phi(\lambda x + (1 − \lambda)y)$ for all $\lambda \in (0,1)$ and $x,y \in \mathbb{R}$. If $\mu$ is a probability measure, and $f$ and $\phi(f)$ are integrable then $$ \phi \bigg( \int f d \mu \bigg) \leq \int \phi(f)d \mu.$$Proof. Let $c=\int f d \mu$ and let $l(x)=ax+b$ be a linear function that has $l(c)= \phi(c)$ and $\phi(x) \geq l(x)$. To see that such a function exists, recall that convexity implies $$lim_{h \to 0^+} \frac{\phi(c)−\phi(c−h)}{h} \leq lim_{h \to 0^+} \frac{\phi(c+h)−\phi(c)}{h}$$ (The limits exist since the sequences are monotone.)
1. Is there somebody who can clarify what this limit (along with the convexity reference) really mean?
If we let $a$ be any number between the two limits and let $l(x) = a(x − c) + \phi(c)$, then $l$ has the desired properties. With the existence of $l$ established, the rest is easy. From the fact that if $g \leq f$ a.e., then $\int g d\mu \leq \int f d\mu$, we have $$ \int \phi(f ) d\mu \geq \int (af + b) d\mu = a \int f d\mu + b = l\bigg(\int f d\mu \bigg)= \phi\bigg(\int f d\mu \bigg)$$ since $c = \int f d \mu$ and $l(c) = φ(c)$.
2. Where does the first inequality $ \int \phi(f ) d\mu \geq \int (af + b) d\mu$ of the last formula come from?
Looking forward to any feedback.
Thank you for your time.