3

I have been thinking about the No Free Lunch (NFL) theorems lately, and I have a question which probably every one who has ever thought of the NFL theorems has also had. I am asking this question here, because I have not found a good discussion of it anywhere else.

The NFL theorems are very interesting theoretical results which do not hold in most practical circumstances, because a key assumption of the NFL theorems is rather strong. This assumption is, roughly speaking, that the performance of an algorithm is averaged over all problem instances drawn from a uniform probability distribution. In realistic applications the problems an algorithm typically encounters are NOT drawn from a uniform distribution, and are instead drawn from what is likely a very interesting and complicated distribution specific to the general problem setting.

So, while the NFL theorems are quite interesting results, do they have any practical implications? Or are they merely theoretical results?

EDIT: By practical implications, I mean novel or improvements over existing algorithms, improved hyper-parameter selection, and things of that nature. I would even be interested to learn of NFL-inspired theorems that do apply to realistic search/optimization/learning problems.

1 Answers1

4

A practical implication is that there is no silver bullet: we shouldn't expect any single optimization method to be perfect for all problems. Rather, we should try to design optimization methods that are tailored to the problem we're trying to solve.

For instance, if you want to use local search, you'll probably need to define a neighborhood relation (a set of moves that makes "small" changes to the current solution) that is informed by the problem domain. See, e.g., https://cs.stackexchange.com/a/88016/755 for a recent example of this here on this site.

A practical implication is that machine learning won't work if there is no structure at all on the space of possible models/hypotheses. Instead, we need some kind of prior that makes some models more likely than others.

Often, we assume a prior that assume "simpler" models are more likely than complex one's (Occam's razor: all else being equal, the simpler explanation is more likely to be true). This leads to use of regularization in machine learning, as it effectively applies Occam's razor to candidate models. So, you can think of the NFL theorems as providing some kind of theoretical justification for regularization or theoretical understanding that helps us see what the role of regularization is and provides some partial explanation for the empirical observation that it seems to often be effective.

You could characterize these as "NFL theorems tell us some directions that won't work", which arguably has value as it helps us avoid wasting time on something that won't work and helps point us towards directions that are more likely to work.

D.W.
  • 167,959
  • 22
  • 232
  • 500