Update I realized a problem with my previous post, here's an update with a general philosophy of "marginalizing" out unwanted covariates.
Introduction
Suppose we have a mean model
\begin{align*}
\mathbb{E}[Y|X^1, X^2] &= \beta_0 + \beta_1 X^1 + \beta_2 X^2 \tag{$\heartsuit$}\\
\mathbb{E}[Y|X^1, X^2] &= e^{\beta_0 + \beta_1 X^1 + \beta_2 X^2} \tag{$\diamondsuit$}
\end{align*}
Assuming $X^1 \perp \!\!\! \perp X^2$, we have from $(\heartsuit)$ the marginal model
\begin{align*}
\mathbb{E}[Y|X^1] = \mathbb{E}[\mathbb{E}[Y|X^1, X^2]] = \beta_0 + \beta_1 X^1 + \beta_2\mathbb{E}[X^2|X^1] = \beta_0^* + \beta_1^* X^1
\end{align*}
with $(\beta_0^*, \beta_1^*) = (\beta_0 + \beta_2\mathbb{E}[X^2|X^1], \beta_1)$ or from $(\diamondsuit)$ the marginal model
\begin{align*}
\mathbb{E}[Y|X^1] = \mathbb{E}[\mathbb{E}[Y|X^1, X^2]] = e^{\beta_0^* + \beta_1^* X^1}
\end{align*}
where $(\beta_0^*, \beta_1^*) = (\beta_0 + \log \mathbb{E}[e^{\beta_2 X^2}|X_1], \beta_1)$. In both these cases, where $Y$ is assumed to follow a linear or log-linear model with respect to covariates $X^1, X^2$, we have that the marginal coefficient $\beta_1^*$ is numerically equal to the full conditional model coefficient $\beta_1$.
Consistently estimating a parameter
Generalized estimating equations set up equations of the form
\begin{align*}
\sum_{k=1}^{K} \mathbf{D}_k^\intercal \mathbf{V}_k^{-1}(\mathbf{Y}_k - \boldsymbol{\mu}_k) = \mathbf{0}
\end{align*}
where $\mathbf{Y}_k$ is a vector of observations which are believed with be correlated within itself, but $\mathbf{Y}_k \perp \!\!\! \perp \mathbf{Y}_{k'}$ (vectors are independent of each other). Without diving too deep into the theory of GEEs, the relevance here is that, as long as a mean model is correctly specified, fitting a GEE will always return consistent estimates of the desired coefficients. And a GLM is a special case of a GEE! What this implies, for example, if we have a model with
\begin{align*}
Y_i = \beta_0 + \beta_1 X^1_i + \beta_2 X^2_i + \epsilon_i, \qquad \epsilon \sim F
\end{align*}
for any distribution $F$ with mean 0 and finite second moments (say, $\epsilon_i \sim \text{Exponential}(\frac{1}{2021}) - 2021$), fitting as if these errors were normal would still yield consistent estimates of $(\beta_0, \beta_1, \beta_2)$! Of course, the standard errors for these estimators would be incorrect and would need to be adjusted with sandwich estimators, but the consistency is not affected.
Application to current problem
The mean of $\text{Beta}(p, 1)$ is $\frac{p}{p+1}$, which is not a tractable form to perform the marginalization trick because of the "non-separability" of $X^2$ from $X^1$. My idea was to search for a transformation to make it more tractability, and once close transformation is setting $\widetilde{Y}_i = -\log(Y_i)$, which provides the mean model
\begin{align*}
\mathbb{E}[\widetilde{Y}_i|X^1, X^2] = \frac{1}{p} = \frac{1}{\beta_0 + \beta_1 X^1 + \beta_2 X^2}
\end{align*}
This still isn't quite there, but now if you're willing to accept the functional form
\begin{align*}
p = e^{\beta_0 + \beta_1 X^1 + \beta_2 X^2}
\end{align*}
(which somewhat makes more sence, since this guarantees $p > 0$ as needed in the beta distribution), we have that
\begin{align*}
\mathbb{E}[\widetilde{Y}_i|X^1, X^2] = e^{-\beta_0 - \beta_1 X^1 - \beta_2 X^2}
\end{align*}
and therefore the marginalization trick reveals
\begin{align*}
\mathbb{E}[\widetilde{Y}_i|X^1] = e^{\beta_0^* + \beta_1^* X^1 }
\end{align*}
where $(\beta_0^*, \beta_1^*) = (-\beta_0 +\log \mathbb{E}[e^{-\beta_2 X^2}|X^1], -\beta_1)$. Essentially, we're treating the $X^2$ like errors in a normal regression model, but noting that they may not be mean 0 and therefore would bias the intercept. That's not a problem, since we are interested in $\beta_1$!
Final procedure
- Transform $\widetilde{Y} = -\log(Y)$
- Regress $\widetilde{Y}$ on $X^1$ with log link with any distribution you want (exponential, normal). The resulting coefficient is consistent for $-\beta_1$.
You assumed that we parametrize $p=e^{\beta_0+\beta_1X_1+\beta_2X_2}$. That's
– Albert Paradek Dec 20 '21 at 16:04reasonable from the info you had. But actually what I am dealing with is that $p\in [0,1]$ (this holds from its interpretation). First thing I did was super simple- what if we put a linear link and don't care about the interpretation. But, maybe I am looking for some link like logit. Do you have some idea how to proceed in that case? I can put it as a new question and give you like 200 points if you solve that :D