70

This may be a ridiculous question, but is it possible to have a problem that actually gets easier as the inputs grow in size? I doubt any practical problems are like this, but maybe we can invent a degenerate problem that has this property. For instance, perhaps it begins to "solve itself" as it gets larger, or behaves in some other bizarre way.

dsaxton
  • 933
  • 6
  • 9

9 Answers9

41

No, it's not possible: at least, not in an asymptotic sense, where you require the problem to keep getting strictly easier, forever, as $n \to \infty$.

Let $T(n)$ be the best possible running time for solving such a problem, where $n$ is the size of the input. Note that the running time is a count of the number of instructions executed by the algorithm, so it has to be a non-negative integer. In other words, $T(n) \in \mathbb{N}$ for all $n$. Now if we consider a function $T: \mathbb{N} \to \mathbb{N}$, we see there is no such function that is strictly monotonically decreasing. (Whatever $T(0)$ is, it has to be finite, say $T(0)=c$; but then since $T$ is monotonically strictly decreasing, $T(c) \le 0$ and $T(c+1) \le -1$, which is impossible.) For similar reasons, there is no function that is asymptotically strictly decreasing: we can similarly prove that there's no running time function $T(n)$ where there exists $n_0$ such that for all $n \ge n_0$, $T(n)$ is monotonically strictly decreasing (any such function would have to become eventually negative).

So, such a problem cannot exist, for the simple reason that running times have to be non-negative integers.


Note that this answer covers only deterministic algorithms (i.e., worst-case running time). It doesn't rule out the possibility of randomized algorithms whose expected running time is strictly monotonically decreasing, forever. I don't know whether it's possible for such an algorithm to exist. I thank Beni Cherniavsky-Paskin for this observation.

D.W.
  • 167,959
  • 22
  • 232
  • 500
28

Although it's not quite an answer to your question, the Boyer-Moore string search algorithm comes close. As Robert Moore says on his web page about the algorithm,

Our algorithm has the peculiar property that, roughly speaking, the longer the pattern is, the faster the algorithm goes.

In other words, generally speaking the algorithm searches for an instance of a target string in a source string and for a fixed source string, the longer the target string is, the faster the algorithm runs.

Rick Decker
  • 15,016
  • 5
  • 43
  • 54
11

Clearly, from a pure mathematical, purely CS algorithm viewpoint this is impossible. But in fact there are several real-world examples of when scaling up your project makes it easier, many which are not intuitive to end-users.

Directions: the longer your directions get, they can sometimes get easier. For example, if I want Google Maps to give me directions for going west 3000 miles, I could drive to the West Coast -- and would get cross-country driving instructions. But if I wanted to go 6000 miles west, I would end up with significantly simpler instructions: get on a plane from NYC to Hokkaido. Giving me a cross-country route that incorporates traffic, roads, weather, etc. is rather difficult algorithmically, but telling me to get on a plane and looking up flights in a database is comparatively significantly simpler. ASCII graph of difficulty vs distance:

           |     /
           |    /
Difficulty |   /                  ____-------
           |  /           ____----
           | /    ____----
            ---------------------------------
                       Distance

Rendering: say I want a render of one face and a rendering of 1000 faces; this is for a billboard ad so both final images must be 10000px by 5000px. Rendering one face realistically would be hard -- at the resolution of several thousand pixels across you have to use really powerful machines -- but for the crowd of 1000 faces each face need only be ten pixels across, and can easily be cloned! I could probably render 1000 faces on my laptop, but rendering a realistic face 10000px across would take a very long time and powerful machines. ASCII graph of difficulty vs. objects rendered, showing how difficulty of rendering n objects to an image of a set size drops off quickly but then returns slowly:

           | -    
           |- -                     _________
Difficulty |   --      ______-------            
           |     ------      
           |       
            ---------------------------------
                        Objects

Hardware control: many things with hardware get much easier. "Move motor X 1 degree" is hard and/or impossible, and you have to deal with all kinds of things that you wouldn't have to deal with for "move motor X 322 degrees".

Short duration tasks: Say you want item X to be on for (very small amount of time) every second. By increasing the amount of time that X runs, you will need less complex software as well as hardware.

Owen Versteeg
  • 219
  • 1
  • 3
4

There are cases. They are the cases where the success criteria is a function of the data, rather than trying to find a single answer. For example, statistical processes whose results are phrased with confidence intervals can become easier.

One particular case I'm thinking of is problems which have a transition from discrete behaviors to continuous behaviors, like fluid flows. Solving the small problem to within a degree of error can involve modeling all of the discrete interactions, which may call for a supercomputer. The continuous behaviors often permit simplifications without yielding results outside of a related error bound.

Cort Ammon
  • 3,522
  • 14
  • 16
3

The question is interesting and USEFUL, because our philosophy in informatics is to solve problems the more we read the more dificult is. But, in fact, the MOST of the problems that are presented in the typical way (difficult) can be easily represented in the "easy" way; even knowing the response of D.W (which is in a wrong considering that easy does not mean faster, means "less slow"; so you do not have to find negative times, you hace to find asymptotic time).

The trick to find one is putting the part of the solution like hints as an entry, and considering the entry of the problem like a constant parameter.

Example: What is the longest way in car between London and Paris avoiding to visit twice a French and a British town and not visiting other country? considerin, you have to go to Birmingham before Ashford, Orleans before Versailles, La Rochelle before Limoge, etc...

It is clear that this problem with long entries will be easier that with short ones.

Example of use: Imagine a playgame managed by the machine, and the IA of the computer have to determine if he has to explore more in the play to find more hints or else, if now is time to deduce what is the best decision to assume.

2

Consider a program that takes as input what you know about a password and then tries to crack it. I think this does what you want. For example:

  • No input-> Brute force crack over all symbols and a word of any length
  • Length of password -> Brute force all symbols in a word of that length
  • Contained symbols -> Shrinks list of symbols to check
  • ...
  • Contained Symbols including multiple occurrences and length -> Only compute permutations
  • All symbols in correct order -> basically solved itself

I should add that this is a trick, since the problem stated like this is inverse to to input size. You could leave out one layer of abstraction and say that the input size is large for no input (check all symbols and lengths of words) and small if you enter the correct password in the beginning.

So it all comes down on how much abstraction you allow.

RunOrVeith
  • 239
  • 1
  • 7
0

As a matter of fact, I do have a problem that gets smaller as the data increases. One of my application records attributes of a particular product, say cheese. Attributes are for instance CheeseType, Brand, Country, Area, MilkType, etc. Every month or so, I get a list of new cheeses that came into the market during that time, along with their attributes. Now, these attributes are typed by hand by a group of humans. Some make typos, or just don't know the value for all attributes.

When you make a search in my database, I try to predict from statistics what the cheese tastes like, based on these attributes. What happens, is that for each attribute, I end up with a range of values ; some are valid some are invalid. Eliminating or correcting these invalid ones is only possible if I have enough data. It's about making the difference between real values and noise, without eliminating rare but valid values.

As you can imagine, with low volume, the noise is too important to fix things properly. If you have 5 instances of Cheddar, 1 of Brie , 1 of Bri, and 1 of Chedar, how do I tell which is correct and which is a typo? With more volume, the typos tend to keep very low, but the rare values get a few crucial increments, making them escape from the noise (backed by experience). In this case, I could imagine 50000 Cheddar, 3000 Brie, 5 Bri, 15 Chedar, for instance.

So yes, some problems solve themselves eventually, when you have enough data.

chris
  • 111
-1

The question asks: "is it possible to have a problem that actually gets easier as the inputs grow in size?" What if the inputs are resources to be used by the algorithm to work on a job. It is common knowledge that the more the resources the better. Below is an example, in which the more there are employees the better.

1) You are given two inputs:
i) The number of employees in an industry. This is a natural number and is the main input $n$.
ii) Information about the industry. There are $t$ tasks to be done by the workers, labeled A, B, C, D... There are $p$ places that connect the tasks enabling the employees to switch between tasks. They are labeled 0, 1, 2, 3... The information given is a simple directed graph made up of routes, for example: A-1, 1-2, 2-B, C-3... For simplicity each route has a cost of 1.

2) Problem:
Starting from the first tasks A, and with $n$ employees, you are to find an optimal strategy for the employees to use in order to visit all the tasks. Tasks can be visited more than once. Recall that you cannot switch between tasks unless you find a path between them. The path from task A to B is independent of that from B to A.

3) Output:
The output is the paths between tasks to be taken by the employees. Each path is associated with the number of employees taking it. For example:

A to B with $n1$ employees (a forward path)
A to C with $n2$ employees (a forward path)
B to D with $n3$ employees (a forward path)
D to B with $n4$ employees (a reversed path)
B to E with $n5$ employees (a forward path)

4) Possible solution:
One possible solution is to first compute the shortest path to the closest nodes from A. This will be a forward path. Then recursively compute the forward path for each visited tasks. The result is a tree. For example:

          A
      B      C
    D   E

Now is the time to determine how the employees are going to traverse the tree so to visit the tasks. Starting from task A with $n$ employees, $n1$ are sent to the left subtree and $n2$ are sent to the right subtree. If $n2 \neq 0$ then none from the left subtree will ever need to go to the right subtree.

For $n=\infty$ the employees will all just move forward. For $n = 1$ the employee will have to use reverse paths in order to visit other tasks. For D to B for example the algorithm will compute that shortest path. This is extra computation. Why not directly compute a shortest path from D to E?! Fine, hopefully that is still extra computation.

But of course computation time will not decrease infinitely (by the way for $n$ too large it will lead to poor resource management and stuff).

yemelitc
  • 42
  • 4
-1

Consider the NP-complete problem 3-SAT. If you keep augmenting the problem by providing inputs of the form x_i = true/false, you either end up converting the individual disjunctions into two-variable clauses, thereby creating a 2-SAT problem which is decidedly P, or you simply end up getting a true/false answer.

For the case where there is redundancy in the x_i = true/false inputs (same input provided a lot of times, or contradictory inputs) you can easily sort the inputs and either ignore the redundant values, or report an error if the values contradict.

In any case, I think this represents a 'realistic' problem that gets easier to solve as the number of inputs grows. The 'easier' aspect is in converting an NP-complete problem to a P problem. You can still game the system by providing ridiculous inputs such that just the sorting would take longer than brute forcing the problem.

Now, a really cool scenario would be if we are willing to accept T(0) (utilizing D.W.'s notation in the answer above) can be infinite. For example, T(0) could be equivalent to solving Turing's Halting Problem. If we could devise a problem such that adding more inputs converts it into a solvable problem, we have struck gold. Note that it is not enough to convert it into a solvable problem asymptotically - because that is just as bad as brute forcing the problem.