There aren't any known attacks on the PRFness of HMAC-SHA256 better than brute force.
(So you can truncate that MAC to length L where $\:\:\frac1{2^L}+\epsilon\:\:$ is an acceptable risk of forgery.)
To reduce the impact of a forgery without making the ciphertext any longer, one should use a
format-preserving encryption (FPE) scheme that is secure against one query to the encryption
oracle and one query to the decryption oracle, which are not necessarily made in that order. $\;\;\;$
Choose either $\; [b_{client} = 0 \: \text{ and } \: b_{server} = 1] \:$ or $ \: [b_{client} = 1 \: \text{ and } \: b_{server} = 0] \;$.
Let $\;\; \operatorname{pL} \: : \: \{0,\hspace{-0.03 in}1,\hspace{-0.03 in}2,\hspace{-0.03 in}3,...\} \: \to \: \{0,\hspace{-0.03 in}1,\hspace{-0.03 in}2,\hspace{-0.03 in}3,...\} \;\;$ be an injective function that can be inverted
efficiently enough. $\;\;\;\;$ For the sentence two after this one to be possible, you will also need that
for all $m$, $\: m\leq \operatorname{pL}(m) \:\:$. $\;\;\;\;$ Let $\;\; \operatorname{rL} \: : \: \{0,\hspace{-0.03 in}1,\hspace{-0.03 in}2,\hspace{-0.03 in}3,...\} \: \to \: \{0,\hspace{-0.03 in}1,\hspace{-0.03 in}2,\hspace{-0.03 in}3,...\} \;\;$ be a function that
can be computed efficiently enough. $\;\;\;\;$ Let $\;\; \operatorname{pad}_{\hspace{.01 in}m}\hspace{-0.01 in} \: : \: \{0,\hspace{-0.03 in}1\}^m \hspace{-0.01 in} \times \{0,\hspace{-0.03 in}1\}^{\operatorname{rL}(m)} \: \to \: \{0,\hspace{-0.03 in}1\}^{\operatorname{pL}(m)}$
be a sequence of injective functions that can be computed efficiently enough and satisfies
$[message$ can be efficiently computed from $\:\operatorname{pad}_{\operatorname{length}(message)}\hspace{-0.02 in}(message,\hspace{-0.02 in}randomness)]$.
$\operatorname{unpad}_{\operatorname{length}(message)}\hspace{-0.02 in}(\operatorname{pad}(message,\hspace{-0.02 in}randomness)) \: = \: message$
For all other values of $x$, $\;\; \operatorname{unpad}(x) \: = \hspace{.08 in} \perp \;\;\;$.
$\operatorname{Enc}(key,party,packetnumber,plaintext,randomness)$
$=$
$\operatorname{FPEencrypt}(\operatorname{HMAC}(key,b_{party}||\hspace{.02 in}packetnumber),\operatorname{pad}(plaintext,randomness))$
$\operatorname{Dec}(key,party,packetnumber,ciphertext)$
$=$
$\operatorname{unpad}(\operatorname{FPEdecrypt}(\operatorname{HMAC}(key,(1-b_{party})||\hspace{.02 in}packetnumber),ciphertext))$
As long as each party only encrypts and decrypts once each for each $packetnumber$, a feasible adversary will have probability $\:\: \frac1{2^{\operatorname{pL}(m)-(m+\operatorname{rL}(m))}}+\epsilon\:\:$ of violating integrity for each submitted ciphertext
of length $\operatorname{pL}(m) \:$ (and a ciphertext whose length is not in $\:\operatorname{range}(\operatorname{pL})\;$ will always decrypt to $\perp$).
Furthermore, if the parameters are such that that probability is noticeable, then the decryptions
of ciphertexts with length $\operatorname{pL}(m)$ that violate integrity will be computationally indistinguishable
from independent samples from the following distributions for each such ciphertext:
if the indicated $party$ did not output a ciphertext for that $packetnumber$, then uniform on $\{0,\hspace{-0.03 in}1\}^m$
else, the plaintext used by $party$ for that $packetnum$ with probability $\frac{2^{\operatorname{rL}(m)}-1}{2^{m+\operatorname{rL}(m)}-1}$,
and each other member of $\{0,\hspace{-0.03 in}1\}^m$ with probability $\; \frac{2^{\operatorname{rL}(m)}}{2^{m+\operatorname{rL}(m)}-1} \:\:$.