7

My motivations for asking this question are philosophical in nature. I'm by no means a computer scientist though, and I feel as though this question should be answered by someone who is since it's one thing to read about a subject second hand and another to understand it first hand. I'm an A-level student of philosophy, physics and mathematics (UK qualification) if that helps you formulate an answer at all.

The question relates to the problem of induction in that, we have little reason to believe the universe is uniform and apply that expectation to science and inference. However, I've seen the claim made that this guy Solomonoff created a theory of induction that uses a Bayesian framework as well as something called 'Kolomogrov complexity' as an objective measure of 'complexity' in a model or hypothesis. For example, if we observe over and over again that all A's are B's, we might be tempted to formulate the hypothesis:

"All A's are B's",

This would be afforded the maximum credence from a bayesian framework since it predicts that A's are B's with a probability of 1 and we're attaching ourselves to the likelihood lover's principle. However, you could also formulate the hypothesis:

"All A's are B's until a time t where they are C."

And there are infinitely many of these hypothesis since the 'C' can represent literally anything. This has a bayesian multiplier equal to the previous hypothesis so technically have an equal credence if we don't prefer one of these in our priors. And the only way (it would seem) to decide between the two is to see if hypothesis 2 shows it's prediction at time t. So we would otherwise have to always wait until time t to see if hypothesis 1 is correct.

My question can be split up into three parts:

1) Is "Kolmogrov complexity" generally considered a genuine formulation of an objective definition for complexity/simplicity?

2) Would the former hypothesis, with such an understanding of complexity be considered more 'simple'?

3) How would this understanding of complexity be used practically when distinguishing between hypotheses? Or is it just obvious that the former type of hypothesis is more simple?

Although this is really a philosophical question, I suppose I would also like to know if anyone here has any ideas about why we should prefer more simple hypotheses. And perhaps, how would you increase your bayesian prior probability given some measure of complexity for a hypothesis?

Thank you in advance for any answers.

2 Answers2

4

Mathematics and computer science doesn't have anything to say about whether simpler hypotheses are more likely. That's a question about reality, not about math / computer science.

What computer science can provide is the notion of Kolmogorov complexity. Kolmogorov complexity is one reasonable notion of simplicity: bit-strings with lower Kolmogorov complexity are "simpler" in some sense.

Now we could start with a model of reality, which says that nature randomly picks a process for generating data; the process is obtained by picking a Turing machine at random, with smaller (simpler) Turing machines more likely to be chosen than complex ones; and then nature runs the Turing machine and the data you observe is the output of that Turing machine. That's a sketch of a possible model of reality; basically, it's an assumption about how nature works.

That assumption then implies (if we fill in some details) a particular prior on hypotheses. A hypothesis amounts to a Turing machine. There will be multiple Turing machines that are all consistent with the observations (that could have produced those observations), and Bayes theorem + the prior will let you infer the posterior distribution on which of those hypotheses are most likely.

This is a possible way to obtain a prior, and it seems reasonable to me. Computer science can't tell you whether that model (that assumption) is a good model of reality. But it can help you work out the consequences of making that assumption.


Finally, you seem to be asking why simpler explanations are more often correct. Basically, why is Occam's razor useful? I don't know if there is any completely convincing answer to that.

One possible answer is the empirical answer: it seems to work well in practice. In other words, we often find in nature that very simple processes/models suffice to explain a wide range of natural observations. See, e.g., the unreasonable effectiveness of mathematics.

There's lots more that one can say about Occam's razor. See, e.g., https://en.wikipedia.org/wiki/Occam%27s_razor#Justifications and https://philosophy.stackexchange.com/questions/tagged/occams-razor?sort=votes. I think that gets beyond the scope of this site.

D.W.
  • 167,959
  • 22
  • 232
  • 500
0

Caveat: This question is certainly intersectional with philosophy, and it seems to have found it's way over: What are the philosophical arguments for and against modeling hypotheses as Turing machines?

As prefatory to your questions, DW's answer is a good start to what is a very complicated question, perhaps a research thesis of some sort. Certainly, were computer scientists to become involved in exploring the import of Occam's razor in scientific inference, no one would be better qualified to put together a framework using the language of information theory, algorithmic complexity, and formal automata to build a model of the production and validation of hypotheses. But, such formal computational models of complex, real-world processes involving mathematical or scientific language communities would have to be narrowed to be useful, and such results would have to be vetted by philosophers of science fluent in such models as explained here in Philosophy StackExchange. Since the advent of the Berlin and Vienna circles, a lot of thought has gone into the formalization of scientific reasoning using formal languages with Carnap and Quine being prominent figures in the endeavor. My personal opinion is professional computer scientists, scientific late-comers to questions of philosophy of science, should certainly reflect if they might provide any additional insights into traditional questions of logic, mathematics, and experimentation, all core pursuits in our discipline. We spend a lot of time coding and constructing proof, and not nearly enough time engaged in philosophical reflection.

You ask:

  1. Is "Kolmogrov complexity" generally considered a genuine formulation of an objective definition for complexity/simplicity?

"Genuine" is what rhetorician's call a weasel word. A "genuine", "real", or "true" definition signals some sort of normative appraisal of an intension or extension, but without providing some sort of precising definition or operationalization of the term, it's unclear what it means. Kolmogrov complexity is a respected formalism. It's a way of abstracting away from a message and providing a semantics in terms of a programming language. We can trade a message for something that generates it, and in this way, it's another lens to view formal strings, languages and automata being complementary in a theoretical sense. As it is essentially a computational method that relies heavily on mathematical and logical presupposition; it also integrates nicely into computer science theory involving string compression, number theory, decision problems, programing constructs, and so on. In that sense, Kolmogorov's work is "genuine" and "objective".

You ask:

  1. Would the former hypothesis, with such an understanding of complexity be considered more 'simple'?

The simplicity of the ideas of the hypothesis or the linguistic construct of the hypothesis? Here, you need to keep straight the difference between information complexity and conceptual complexity, because by abstracting away from the natural language meaning of a natural language claim, you are moving from one form of complexity to another, and they are apples and oranges. With information, we can talk, as Shannon taught us, about the complexity of a string or the string-representation of machine instructions. But scientific hypotheses have value because of their natural language explanatory power of physical phenomena. Algorithmic complexity is a description of construction of a language, not the content of the language, if you accept the existence of propositions (SEP) as typically understood in the philosophy of language, a choice controversial to some.

In philosophy, we recognize that the value of having and vetting empirical claims about the physical world is predicated upon the assertoric force of theory according to a correspondence theory of truth. That's a bit of a mouthful, but what it means is that when we describe gravity, our description is true, or in more contemporaneous philosophical discourse adequate. Adequacy, after all, is the spirit of "all models are wrong, but some are useful", and gets us away from black and white thinking. If we talk about the structure of the syntax, then we are dealing with information and algorithmic complexity. If we talk about the scientific phenomena, the semantics of the theory, this is something else. In philosophy, it might be understood as natural language ontology (SEP). Occam's razor (IEP) is a statement of epistemological or ontological complexity, not information or algorithmic complexity! We can't confuse those apples and oranges. From the IEP article:

For Ockham, the principle of simplicity limits the multiplication of hypotheses not necessarily entities. Favoring the formulation “It is useless to do with more what can be done with less,” Ockham implies that theories are meant to do things, namely, explain and predict, and these things can be accomplished more effectively with fewer assumptions.

We simply don't have a way of describing the content of scientific hypotheses according to string complexity, as far as I know. A hypothesis has linguistic and grammatical features, but without getting into ontological relativity or quantifier variance it has purely semantic features too, for instance, represented by infelicitious constructions (LinguisticsSE). These are considered by some type theoretic issues.

Consider a hypothesis which involves invoking a reference with a complex string representation generated by rules of IUPAC nomenclature. It might have an interesting algorithmic complexity. But the reference to the substance might have no real conceptual complexity in a theory. We might be able to do a substitution to replace a complex token for a simple one with no loss of semantics. That's because formal language (and related computational qualities) is not the same thing as conceptualization (SEP), generally taken to be non-linguistic in nature.

So, algorithmic complexity describes the linguistic qualities of scientific discourse, but Occam's razor, which is a rough heuristic without a technical definition, deals in the logic, entities, and ideas of science. It's like talking about the hardware layer in the OSI model, and expecting to draw insights about what is transpiring at the application layer. The semantics of the application layer are categorically distinct from the semantics of the information and computer technology of the system. Occam didn't advocate using simple grammars, he advocated using simple ideas and claims. Does that mean that this entire approach is bogus? No.

You ask:

  1. How would this understanding of complexity be used practically when distinguishing between hypotheses? Or is it just obvious that the former type of hypothesis is more simple?

In terms of a practicing empirical or formal scientist, it might be possible, looking through the lens of computational and algorithmic complexity to analyze a model theoretic formulation of a hypothesis. Here we get to the core of the intersectionality of computer science and philosophy of science if we take an approach to philosophy of language that follows Tarski's path towards modeling language. Here, philosophy of science has given consideration on some topics that are near and dear to the computer scientist who identifies as a logician. Can hypotheses be written in FOL? HOL? ITT (SEP)? Do they use classical logic? Can entities be represented as set-theoretic comprehensions? Do they involve PA? Can they be modeled with ZFC? Can we represent claims with priors expressed in probabilistic formalisms? In the extreme, we might consider an NLP formalism like Ranta's type-theoretic grammar (GB) to model natural language. Or we might consider mathematical hypotheses which are subject to theorem proving systems, a type of automated reasoning (SEP).

Here, we see that we might be engaged in computer science as a form of experimental philosophy (SEP) drawing on any formalisms amenable to computational methods: probability, set theory, model theory, type theory, computational and complexity theory, and so on. But we have to do it in a meaningful way. Simply picking information or algorithmic complexity and equating it with another form of complexity, be it ontological, epistemological, or methodological won't be profitable unless we can defend the translation from one theory of complexity into another. And those sorts of tasks might be instances of metaphysical grounding (SEP) or agent-based modeling in philosophy of science (SEP), and beyond the scope of computer science in the narrow conception. So, if the OP or the reader has questions about the application of Bayesian reasoning, the better forum would be Philosophy SE where the interpretation and applications of such formal logics is of perennial interest leaving more narrow and technical questions of automata theory and algorithmic complexity to this forum.

J D
  • 181
  • 7