Let $c,d\in [a,b]$, with $c<d$. It suffices prove that
$$
f(d)-f(c)>-(4+d-c)\varepsilon,
$$
for every $\varepsilon>0$.
We enumerate $A$ as $A=\{\alpha_n\}_{n\in\mathbb N}$ and choose $\delta_n>0$, such that
$$
x\in(\alpha_n-\delta_n,\alpha_n+\delta_n)\quad\Longrightarrow\quad|f(x)-f(\alpha_n)|
<\frac{\varepsilon}{2^n}
$$
for all $n\in\mathbb N$. Finding such $\delta_n$'s is possible due to continuity of $f$.
Set $I_n=(\alpha_n-\delta_n,\alpha_n+\delta_n)$. In particular
$$
y_1,\,y_2\in I_n\,\,\,\Longrightarrow\,\,\, f(y_2)>f(y_1)-\frac{\varepsilon}{2^{n-1}}
\tag{1}
$$
Let $x\in [a,b]\setminus A$. Then there exists an $\eta_x>0$, such that
$$
y\in(x-\eta_x,x+\eta_x)\quad\Longrightarrow\quad
-\varepsilon |y-x|<f(y)-f(x)-(y-x)f'(x)< \varepsilon|y-x|,
$$
and hence whenever $y_1,y_2\in J_x=(x-\eta_x,x+\eta_x)$, with $y_1\le x\le y_2$, we have that
$$
f(y_2)-f(y_1)-(y_2-y_1)f'(x)\ge -\varepsilon(|y_1-x|+|y_2-x|)
$$
and since $f'(x)\ge 0$, we finally obtain that
$$
f(y_2)>f(y_1)-\varepsilon(y_2-y_1). \tag{2}
$$
We shall use the following result (for a proof see here):
Cousin's Lemma. Let $\mathcal C$ be a full cover of $[a, b]$, that is, a collection of closed subintervals of $[a, b]$ with the property that for every
$x\in[a, b]$, there exists a $\delta>0$, so that $\mathcal C$ contains all subintervals of $[a, b]$ which contains $x$ and have length smaller than $\delta$. Then there exists a partition $\{I_1,\,I_2,\ldots,I_m\}\subset\mathcal C$ of non-overlapping intervals
for $[a, b]$, where $I_i=[x_{i-1}, x_i]$ and
$a=x_0 < x_1 <\cdots <x_n=b,$ for all $1\le i\le m$.
We define a $\mathcal C$ the collection of all closed subintervals $K$ of $[c,d]$, such that either $K\subset I_n$ and $\alpha_n\in K$, for some $\alpha_n\in A$ or $K\subset J_x$ and $x\in K$ for some $x\in [a,b]\setminus A$. Cousin's Lemma provides the existence of points $c=x_0<x_1<\cdots<x_m=d$, such that the closed intervals
$$
K_1=[x_0,x_1],\, K_2=[x_1,x_2],\ldots,K_m=[x_{m-1},x_m]
$$
belong to $\mathcal C$.
From the construction of $\mathcal C$, each $K_j$ is either a subinterval of some $I_n$ or some $J_x$, and possibly $K_j$ is a subset of more than one such intervals. To every $K_j$ we assign exactly one such interval. In particular, to every $j\in\{1,\ldots,m\}$ we assign either a unique $n\in\mathbb N$, such that $\alpha_n\in K_j\subset I_n$, which we denote as $n_j$, or a unique $x\in [a,b]\setminus A$, such that $x\in K_j\subset J_x$. This mapping is not necessarily $1-1$, since if $\alpha_n$ is the common endpoint of $K_j$ and $K_{j+1}$, it is possible that $n_j=n_{j+1}$. Thus, some of the $I_n$'s may have been assigned to two $K_j$'s (and no more than two).
We split $S=\{1,\ldots,m\}$ as a union of two disjoint sets. $S_1$ shall be the set of those $j\in S$, to which an $n\in\mathbb N$ has been assigned (i.e., $\alpha_n\in K_j\subset I_n=I_{n_j}$) while $S_2=S\setminus S_1$. If $j\in S_2$, then an $x\in [a,b]\setminus A$ has been assigned to $j$ and $x\in K_j\subset J_x$.
If $j\in S_1$, and $K_j\subset I_{n_j}$
then $(1)$ provides the $f(x_j)-f(x_{j-1})>-\dfrac{\varepsilon}{2^{n_j-1}}$,
while if $j\in S_2$, then $(2)$ provides that
$ f(x_j)-f(x_{j-1})>-\varepsilon (x_j-x_{j-1})$.
We now have that
$$
f(d)-f(c)=\sum_{j=1}^m \big(f(x_j)-f(x_{j-1})\big)=
\sum_{j\in S_1} \big(f(x_j)-f(x_{j-1})\big)+\sum_{j\in S_2} \big(f(x_j)-f(x_{j-1})\big) \\
\ge -\sum_{j\in S_1} \frac{\varepsilon}{2^{n_j-1}}-\sum_{j\in S_2}\varepsilon(x_j-x_{j-1})
> -4\varepsilon-\varepsilon(d-c)=-(4+d-c)\varepsilon.
$$
The last inequality holds because in the first sum, $\sum_{j\in S_1} \dfrac{1}{2^{n_j-1}}< 2\sum_{n=1}^\infty \dfrac{1}{2^{n-1}}=4$, since the power $\dfrac{1}{2^{n-1}}$ may appear twice, if $\alpha_n$ is an endpoint of two neighboring $K_j$'s.