According to Wasserman's All of Statistics pg. 92:
A $1 - \alpha$ confidence interval for a paramater $\theta$ is an interval $C_n = (a,b)$ where $a = a(X_1, \ldots , X_n)$ and $b = b(X_1 , \ldots, X_n)$ are functions of the data such that
$$ \mathbb{P}_\theta ( \theta \in C_n ) > 1 - \alpha \text{, for all } \theta \in \Theta $$
...If $\theta$ is a vector then we use a confidence set (such as a sphere or an ellipse) instead of an interval.
Question:
While I understand conceptually what a confidence interval is (i.e., a 95% CI means that 95% of experiments will trap the paramater in the interval), I don't understand how this formality is capturing this concept.
In particular, I don't understand what is meant by the notion of $\mathbb{P}_\theta( \theta \in C_n)$. What is the sample space which $\mathbb{P}$ is drawing from? What is the set $\theta \in C_n$? It seems here $\theta$ is being treated both as a fixed value (from the notation $\mathbb{P}_\theta$) and as a random variable (by the notation $\theta \in C_n$)