Let $X_1, \cdots, X_n$ be $n$ independent geometric random variables with success probability parameter $p = 1/2$, where $X_i = j$ means it took $j$ trials to get the first success. Let $S_d = \sum_{i=1}^n X_i^d$ and $\mu_d = \mathbb{E}\left( S_d \right)$. Given $\delta > 0$, I am interested in finding an inequality of the form $$\text{Pr} \left \lbrace S_d \geq (1 + \delta) \mu_d \right \rbrace \leq c \exp\left(-f(\delta) n^\alpha\right)$$ for some function $f$, some positive constant $c$, and $0 < \alpha \leq 1$, where I would like $\alpha$ to be as large as possible. In the case of $d = 1$, one can use a Chernoff-like approach to get a result with $\alpha = 1$. However, interestingly, the Chernoff-like approach does not appear to work for $d > 1$ because the moment generating function $M_{X_i^d}(s) = \exp(s X_i^d)$ does not appear to be finite.
In my approach for this problem, I let $\mathcal{E}$ be the event that $\forall i: X_i \leq b$ for some positive $b$ that has yet to be determined. Then we find that
\begin{align} \text{Pr} \left \lbrace S_d \geq (1 + \delta) \mu_d \right\rbrace &= \text{Pr} \left \lbrace S_d \geq (1 + \delta) \mu_d | \mathcal{E}\right\rbrace \underbrace{\text{Pr} \left \lbrace \mathcal{E}\right\rbrace}_{\leq 1} + \underbrace{\text{Pr} \left \lbrace S_d \geq (1 + \delta) \mu_d | \mathcal{E}^c\right\rbrace}_{\leq 1} \text{Pr} \left \lbrace \mathcal{E}^c \right\rbrace \\ &\leq \text{Pr} \left \lbrace S_d \geq (1 + \delta) \mu_d | \mathcal{E}\right\rbrace + \text{Pr} \left \lbrace \mathcal{E}^c \right\rbrace \end{align}
I believe one can use Hoeffding to bound $\text{Pr} \left \lbrace S_d \geq (1 + \delta) \mu_d | \mathcal{E}\right\rbrace$ since the conditioning implies that each $X_i$ is a bounded random variable that takes values in $[1, b]$. $\text{Pr} \left \lbrace \mathcal{E}^c \right\rbrace$ can be found readily. With some work shown in a previous post here, we can choose $b$ such that we obtain a result with the desired form that has $\alpha = 1/(2d+1)$.
Unfortunately, this bound is loose in the sense that when $d = 1$, this approach obtains $\alpha = 1/3$ while the Chernoff-like approach gives $\alpha = 1$ in this case. Does anyone have any idea how to improve the concentration result for $d > 1$? Note that apparently for $d = 2$, the optimal result is $\alpha = 1/2$, though I do not know where to look to find the proof.
This problem has been on and off in the back of my mind for some time, so any insights would be great.