4

(Note this is not an assignment, but revision for a topic from Cambridge past exam papers) I have been trying to attempt the below question, and I am struggling with part (b).

enter image description here

For (a) it is obvious that the pmf is the same as the Bernoulli and so

$$ f(y;p) = \exp\left(y\log\left(\frac{p}{1-p}\right)+\log(1-p) \right).$$

Then for (b), the log-likelihood is given by $$\ell(\beta,y) = \sum_{i=1}^n \left(y_i\log\left(\frac{p_i}{1-p_i}\right)+\log(1-p_i) \right)$$

Now, $\log(\frac{p}{1-p}) = \beta^Tx \Rightarrow \log(1-p) = \log(p) - \beta^Tx$

And so,

\begin{align} \ell(\beta,y) &= \sum_{i=1}^n y_i\beta^Tx_i + \sum_{i=1}^n \log(p) - \sum_{i=1}^n \beta^Tx_i\\ &= \sum_{i=1}^n y_i\beta^Tx_i + \sum_{i=1}^n \log\left(\frac{e^{\beta^Tx}}{1+e^{\beta^Tx}}\right) - \sum_{i=1}^n \beta^Tx_i \end{align}

And i am unsure how to deal with this middle term, particularly when differentiating, as i cannot get the final answer.

Ѕᴀᴀᴅ
  • 35,369
Btzzzz
  • 1,123

1 Answers1

3

Denote $β = (β_1, \cdots, β_p)^{\mathrm{T}}$ and $x_i = (x_{i1}, \cdots, x_{ip})^{\mathrm{T}}$, then $β^{\mathrm{T}} x_i = \sum\limits_{j = 1}^p β_j x_{ij}$. Since $\displaystyle p_i(β) = \frac{\mathrm{e}^{β^{\mathrm{T}} x_i}}{1 + \mathrm{e}^{β^{\mathrm{T}} x_i}}$,\begin{align*} L(β; y) &= \sum_{i = 1}^n \left(y_i \ln\left(\frac{p_i(β)}{1 - p_i(β)}\right) + \ln(1 - p_i(β))\right)\\ &= \sum_{i = 1}^n \left(y_i β^{\mathrm{T}} x_i + \ln\left(\frac{1}{1 + \mathrm{e}^{β^{\mathrm{T}} x_i}}\right)\right) = \sum_{i = 1}^n (y_i β^{\mathrm{T}} x_i - \ln(1 + \mathrm{e}^{β^{\mathrm{T}} x_i}))\\ &= \sum_{i = 1}^n y_i \sum_{j = 1}^p β_j x_{ij} - \sum_{i = 1}^n \ln\left(1 + \exp\left(\sum_{j = 1}^p β_j x_{ij}\right)\right). \end{align*} For each $1 \leqslant k \leqslant p$,$$ \frac{\partial L}{\partial β_k}(β; y) = \sum_{i = 1}^n y_i x_{ik} - \sum_{i = 1}^n \frac{\exp\left(\sum\limits_{j = 1}^p β_j x_{ij}\right)}{1 + \exp\left(\sum\limits_{j = 1}^p β_j x_{ij}\right)} \cdot x_{ik} = \sum_{i = 1}^n y_i x_{ik} - \sum_{i = 1}^n p_i(β) x_{ik}. $$ Note that $\displaystyle \frac{\partial L}{\partial β_k}(\hat{β}; y) = 0 \ (1 \leqslant k \leqslant p)$, thus\begin{align*} 0 &= \sum_{k = 1}^p \hat{β}_k \frac{\partial L}{\partial β_k}(\hat{β}; y) = \sum_{k = 1}^p \hat{β}_k \left(\sum_{i = 1}^n y_i x_{ik} - \sum_{i = 1}^n p_i(\hat{β}) x_{ik}\right)\\ &= \sum_{i = 1}^n y_i \sum_{k = 1}^p \hat{β}_k x_{ik} - \sum_{i = 1}^n p_i(\hat{β}) \sum_{k = 1}^p \hat{β}_k x_{ik} = \sum_{i = 1}^n y_i \hat{β}^{\mathrm{T}} x_i - \sum_{i = 1}^n p_i(\hat{β}) \hat{β}^{\mathrm{T}} x_i\\ &= \sum_{i = 1}^n y_i \mathop{\mathrm{logit}}(p_i(\hat{β})) - \sum_{i = 1}^n p_i(\hat{β}) \mathop{\mathrm{logit}}(p_i(\hat{β})), \end{align*} which implies$$ \sum_{i = 1}^n p_i(\hat{β}) \mathop{\mathrm{logit}}(p_i(\hat{β})) = \sum_{i = 1}^n y_i \mathop{\mathrm{logit}}(p_i(\hat{β})). $$

Ѕᴀᴀᴅ
  • 35,369