It is known that if $\mathcal{D}=\sigma(\{B_1,B_2,\cdots,B_N\})$ then $$\mathbb{E}(X|\mathcal{D})(\omega)=\sum_{i=1}^N\mathbb{E}(X|B_i)\mathbb{1}_{B_i}(\omega)$$ I want to see how this works so I try this on a particular example. The simplest come in to mind is coin tossing. Suppose we toss the coin 6 times and count the number of heads $Y=\sum_{i=1}^6X_i$, $\Pr(X_i=1)=\Pr(X_i=0)=\frac{1}{2}$ for all $i$ independently. To give a concrete realization of conditional expectation let's consider the first 3 tosses: $$B_1=\{\omega:X_1=0,X_2=0,X_3=0\}$$ $$B_2=\{\omega:X_1=1,X_2=0,X_3=0\}$$ so on and so forth. Then $\mathbb{E}(Y|\mathcal{D})(\omega)=\sum_{i=1}^N\mathbb{E}(Y|B_i)\mathbb{1}_{B_i}(\omega)$. Now to get the sense of each version of the conditional expectation, we need to compute $\mathbb{E}(Y|B_i)$. Start with $\mathbb{E}(Y|B_1)$. Recall the definition of conditional expectation on a set $$\mathbb{E}(Y|B)=\frac{\mathbb{E}(\mathbb{1}_BY)}{\Pr(B)}=\frac{\int\mathbb{1}_BYdP}{\Pr(B)}$$ Since $\Pr(B_1)=\Pr(X_1=0)\Pr(X_2=0)\Pr(X_3=0)=\frac{1}{8}$. So $$\mathbb{E}(Y|B_1)=8\int\mathbb{1}_{B_1}YfP=8\int\sum_{i=4}^6X_idP=8\sum_{i=4}^6\int X_idP=8\cdot\frac{3}{2}=12$$ I get the sense of something has gone wrong. Can you tell me where?
Side question: Usually probability theory starts with probability space $(\Omega,\mathcal{F},\Pr)$. But it seems it's also okay to define joint distribution and needless to specify the underlying probability space. What is the theory that supports this?