What is the relation between correlation and causation in machine learning?

Question

It is a well-known fact that "Correlation doesn't equal causation", but machine learning seems to be almost entirely based on correlation. I'm working on a system to estimate the performance of students on questions based on their past performances. Unlike other tasks, like Google search, this doesn't seem like the kind of system that can be easily gamed - so causation isn't really relevant in that regard.

Clearly, if we want to do experiments to optimise the system, we will have to care about the correlation/causation distinction. But, from the point of view of just building a system to pick questions that are likely to be of the appropriate difficulty level, does this distinction have any importance?

score 12 · Accepted Answer · answered Feb 18 '14 at 09:33

Not all of AI works on correlation, Bayesian Belief Networks are built around the probability that A causes B.

I'm working on a system to estimate the performance of students on questions based on their past performances.

I don't think you need causation for this. A past performance does not cause a current performance. Answering on an early question does not cause an answer on a later question.

But from the point of view of just building a system to pick questions that are likely to be of the appropriate difficulty level - does this distinction have any importance?

No, not for your example. I think correlation (or even simple extrapolation) would solve your problem very well. Assign a difficulty score to each of the questions and then feed questions to the students in increasingly difficult levels (which is how most exams work) and then when the student starts getting them wrong, you can wind back the difficulty. That's a feedback algorithm that is similar to the error minimisation performed on a neuron in a multi-layered perceptron. The non-trivial piece of input spaces like this is deciding what a difficult question is!

A better example of causation in AI would be:

My car is slowing down. My accelerator is on the floor. There is not much noise. There are lights on the dashboard. What is the probability that I've run out of fuel?

In this case, running out of fuel has caused the car to slow down. This is precisely the sort of problem that Bayesian Belief Networks solve.

score 6 · Answer 2 · answered Feb 18 '14 at 10:31

machine learning seems to be almost entirely based on correlation

I don't think so, not in general at least. For example, the main assumption for ML algorithms in terms of PAC analysis and VC dimension analysis, is that training/testing data come from the same distribution that future data will.

So in your system, you would have to assume, that each student imposes some kind of conditional probability distribution that generates answers to particular types of questions on particular topics. Another, and more problematic assumption you have to make, is that this distribution doesn't change (or doesn't change fast).

score 2 · Answer 3 · answered Feb 18 '14 at 13:39

In addition to the other answers, there's an interesting topic - if you're manually selecting features, you might want to think about 'coincidental correlation' to reduce overfitting, i.e., avoiding features that somehow happen to be correlated in your training data but wouldn't/shouldn't be correlated in the general case - that there is no causal relation whatsoever.

As a crude example, let's suppose that you take a data table of historical exam results and try to predict fail/pass criteria; you simply include all available data fields as features, and that the table happens to have the students birthday as well. Now, there may well be a valid correlation in the training data that students born on 12th february almost always pass and students born on 13th february almost always fail... but since there is no causal relationship, that should be excluded.

In real life it's a bit more subtle, but it helps to distinguish correlations that fit your data to valid signals that should be learned form; and correlations that are simply patterns caused by random noise in your training set.

score 2 · Answer 4 · answered Feb 20 '14 at 16:35

I agree with the prior answers.

If, however, you're interested in looking at correlation/causation in general, two items you might want to look at are:

Pearl (yes, that Pearl) has produced one of very few decent books on it.
Reinforcement Learning, and the multi armed bandit problem are all based around an actor trying to infer optimal courses of action in an unknown environment - i.e. they must learn which 'actions' will give them the best 'reward', and so implicitly tease out causal relationships.

What is the relation between correlation and causation in machine learning?

4 Answers4