I would like to model the conditional density of two real-valued random variable and estimate it using the empirical conditional mean embedding. I am not sure which of these two are correct way of doing this in RKHS.
A
If we model the joint density ${\widehat f}(\boldsymbol{z},\boldsymbol{x})= \sum_{i=1}^n\alpha_{i}\phi_{i}(\boldsymbol{z},\boldsymbol{x}) $, then \begin{equation} \widehat f(\boldsymbol{z}|\boldsymbol{x}) = \frac{\widehat f(\boldsymbol{z},\boldsymbol{x})}{\widehat f(\boldsymbol{x})} = \sum_{i=1}^n \frac{\alpha_{i}}{\int \sum_{j=1}^n\alpha_{j}\phi_{j}(\boldsymbol{z},\boldsymbol{x})d\boldsymbol{z}}\phi_{i}(\boldsymbol{z},\boldsymbol{x})\equiv \sum_{i=1}^n\omega_{i}(\boldsymbol{x})\phi_{i}(\boldsymbol{z},\boldsymbol{x}) = W^\top_{\boldsymbol{x}} \Phi(\boldsymbol{z},\boldsymbol{x}) \end{equation}
so the basis functions $\Phi(\boldsymbol{z},\boldsymbol{x}) = \big(\phi_{1}(\boldsymbol{z},\boldsymbol{x}),\phi_{1}(\boldsymbol{z},\boldsymbol{x}),\dots, \phi_\mathbf{b}(\boldsymbol{z},\boldsymbol{x})\big)$ dependent on the samples $\{\boldsymbol{z}_i, {\boldsymbol{x}}_i\}^n_{i=1}$. And $\phi_{i}(\boldsymbol{z},\boldsymbol{x})$ is a kernel function centered at $(\boldsymbol{z}_i, \boldsymbol{x}_i)$).
B
\begin{equation} \widehat f(\boldsymbol{z}|\boldsymbol{x}) = \sum_{i=1}^n\omega_{i}(\boldsymbol{x})\phi_{i}(\boldsymbol{z}) = W^\top_{\boldsymbol{x}} \Phi(\boldsymbol{z}) \end{equation} so the basis functions $\Phi(\boldsymbol{z}) = \big(\phi_{1}(\boldsymbol{z}),\phi_{1}(\boldsymbol{z}),\dots, \phi_\mathbf{b}(\boldsymbol{z})\big)$ and the varying coefficient $W_{\boldsymbol{x}}$ dependent on the samples $\{\boldsymbol{z}_i, {\boldsymbol{x}}_i\}^n_{i=1}$. And $\phi_{i}(\boldsymbol{z})$ is a kernel function centered at $(\boldsymbol{z}_i)$).
Questions:
which of above are correct?
Are my notations correct?
What changes above notation if we assume that $z$ is a vector rather than a scalar?