3

It's well known that 3 random variables may be pairwise statistically independent but not mutually independent, for an illustration see: example pairwise vs. mutual relations.

Can mutual statistical independence be modeled with Bayesian Networks aka Graphical Models? These are nonparametric structured stochastic models encoded by Directed Acyclic Graphs.

"Each vertex represents a random variable (or group of random variables), and the links express probabilistic relationships between these variables. The graph then captures the way in which the joint distribution over all of the random variables can be decomposed into a product of factors each depending only on a subset of the variables." -- CM Bishop Pattern Recognition and Machine Learning, Ch. 8 p. 630.

Here's a basic example from Wikipedia:

enter image description here

It would appear that hypergraphs are needed to represent the higher order (in)dependence. Is there some trick based on "d-separation", "Markov Blankets" or maybe grouping variables that would enable such a representation?

alancalvitti
  • 3,440
  • 1
    reference? The mutual independence is strictly stronger than the pairwise one in a sense that the former implies the latter, but the latter does not always imply the former. – SBF Jan 21 '13 at 17:26
  • As @Ilya (+1) alludes to, it might help you to review the formal definition of independence. From that it will be clear that your conjecture is false. – cardinal Jan 21 '13 at 17:31
  • @Ilya, I could have sworn I read in a recently published text (~2010) that the conditions are independent. But it's the first time I had seen that hence I was looking for references. I will edit my Q until I find that book again. It does not affect the underlying representation issue. – alancalvitti Jan 21 '13 at 17:46
  • I am not aware of the most recent results on the independence, but would be surprised if they make old ones being wrong :) though I'm not aware of the statistical independence either – SBF Jan 21 '13 at 18:11
  • @cardinal, I understand the formal definition as factorization conditions, stated in the link provided. (examples pairwise vs. mutual relations). Are you saying that P(ABC)=P(A)P(B)P(C) implies: (P(AB)=P(A)P(B) and P(AC)=P(A)P(C) and P(BC)=P(B)P(C))? – alancalvitti Jan 21 '13 at 19:09
  • @alancalvitti: Yes, I am. For each of those, let the "odd man out" be the entire sample space $\Omega$. For example, take $C = \Omega$ to get the first equation you give in parentheses. – cardinal Jan 21 '13 at 19:14
  • @cardinal, I have two issues with that: (1) does it remain true if A, B, C are random variables as opposed to events? (2) Does the implication hold if the r.v's are general? you assumed a special structure by setting one event as the entire sample space. – alancalvitti Jan 21 '13 at 19:24
  • @Ilya, By statistical independence I mean factorization of the joint distribution (ie, same sense as in probability theory) the textbook stated the 4 factorization conditions as being independent of one another. I will try to find that reference. – alancalvitti Jan 21 '13 at 19:28
  • @alancalvitti: It is hard to see why you would have "issues" with that. To respond to your question: If $A$, $B$ and $C$ are random variables, then $\mathbb P(A)$ makes no sense in the first place! Probabilities are about measuring the size of sets, i.e., events. You can ask about the probabilities $\mathbb P (X \in A)$ where $X$ is a random variable and $A$ is a (measurable) set contained in its image. (cont.) – cardinal Jan 21 '13 at 19:58
  • @alancalvitti: Second, I assumed no special structure. The stated equality you give must hold for all finite intersections of the associated (measurable) sets. That is the definition of mutual independence. It is a simple exercise to show that $\Omega$ can be included ("appended") in the collection without loss of generality. – cardinal Jan 21 '13 at 19:58
  • @cardinal, One source of confusion is the distinction between classical measure-space probability versus Bayesian probability. The latter is not directly based on measure theory (eg Jaynes Probability Theory - though Diaconis has criticised the lack of fundamentals in this book, The Laplace-Cox-Polya axioms are generalizations of Modus Ponens, so it is quite different foundations). P in Bayesian analysis refers to the joint distribution functions. But I agree you raise very good points. – alancalvitti Jan 21 '13 at 20:20
  • @alancalvitti: Jaynes' text, while interesting, does not provide a rigorous framework for probability theory. His own grasp of rather standard mathematics was quite shaky. All Bayesians (whether they think of themselves as probabilists, statisticians or something else) I know (Diaconis included, of course) embrace the measure-theoretic framework. – cardinal Jan 21 '13 at 20:30
  • @cardinal, your last point is debatable. First of all, Laplace developed much of what is called Bayesian analysis, without the benefit of measure theory (probability as degree of belief). Also a recent issue of Stat. Sci. [http://www.imstat.org/sts/] shows that often results based on classical vs. Bayesian methods can greatly diverge. Clearly, in practice, both branches are in common usage. In fact there is splintering such as Empirical Bayes etc. Also, there is little overlap between Bayesian Networks and measure theory. – alancalvitti Jan 21 '13 at 20:37
  • @cardinal, you can see that given joint and marginal distributions, for example for 3 binomial variables, http://math.stackexchange.com/questions/281800/example-relations-pairwise-versus-mutual it makes sense to look at the factorization conditions without invoking measure theory, correct? Is there an example in the opposite direction? – alancalvitti Jan 21 '13 at 20:48
  • Dear @alancalvitti: Briefly: That classical and Bayesian statistical methods can, at times, yield different solutions and inferences says nothing about the measure-theoretic foundations of probability. At all. It seems you are conflating these two. I am quite familiar with Laplace's work (in the original French). He also manipulated obviously divergent series with little regard for the consequences. That this resulted in correct solutions also says nothing about the underlying mathematical foundations. (cont.) – cardinal Jan 21 '13 at 21:09
  • You are correct that if the sample space is finite and one is only ever concerned with a finite number of events, then only an extremely rudimentary version of measure theory is needed to build up the theory. But, that gives short shrift to the topic as a whole. Cheers. – cardinal Jan 21 '13 at 21:09
  • @cardinal, not only Jaynes, but also Kolmogorov pointed out that real-world data is always finite (and endowed with only the discrete and indiscrete topologies). - I would love to read Laplace in French, but please, given the question and the example of pairwise independent r.v.'s not implying mutual indepedendence? Does the opposite direction also hold or not? – alancalvitti Jan 21 '13 at 21:15
  • @cardinal, on further reflection, I don't think that the distributional notion of dependence assumes discrete variables (thought the figure I made in the linked Q is indeed for 3 binomial r.v.s). In general, the probability is defined by the integral of the pdf's, and these are independent if they factorize, ie, can be written as integrals of products (for any events). Correct? – alancalvitti Jan 22 '13 at 16:41
  • My question seems related to yours: http://math.stackexchange.com/questions/2149014/is-there-a-combinatorial-topological-treatment-of-statistical-independence Although I would remark that I think the relevant structure would be independence systems/abstract simplicial complexes, which are a special case of hypergraphs, but for a general hypergraph we don't have that $n$-wise independence implies $(n-1)$-wise independence, which is true for statistical independence (I think), so we would need to specialize to abstract simplicial complexes. – Chill2Macht Feb 17 '17 at 17:06
  • I do think that there might be a relationship, because I know that algebraic statistics can be used to model Bayesian networks (although I don't yet know exactly how), and the closest reference I could find about independence complexes (hypergraphs) being used to model relationships of statistical independence is this document here: http://www.qucosa.de/fileadmin/data/qucosa/documents/14637/tesis-principal.pdf which appears to be a PhD thesis about algebraic statistics. – Chill2Macht Feb 17 '17 at 17:08

0 Answers0