24

Let's use Traveling Salesman as the example, unless you think there's a simpler, more understable example.

My understanding of P=NP question is that, given the optimal solution of a difficult problem, it's easy to check the answer, but very difficult to find the solution.

With the Traveling Salesman, given the shortest route, it's just as hard to determine it's the shortest route, because you have to calculate every route to ensure that solution is optimal.

That doesn't make sense. So what am I missing? I imagine lots of other people encounter a similar error in their understanding as they learn about this.

Tom Mercer
  • 351
  • 2
  • 6

5 Answers5

47

Your version of the TSP is actually NP-hard, exactly for the reasons you state. It is hard to check that it is the correct solution. The version of the TSP that is NP-complete is the decision version of the problem (quoting Wikipedia):

The decision version of the TSP (where given a length L, the task is to decide whether the graph has a tour of at most L) belongs to the class of NP-complete problems.

In other words, instead of asking "What is the shortest possible route through the TSP graph?", we're asking "Is there a route through the TSP graph that fits within my budget?".

TheHans255
  • 403
  • 3
  • 6
D.R
  • 604
  • 7
  • 10
15

There is a lot of decent answers here but none clear up a couple fairly important misunderstandings you seem to have.

Both P and NP are classes of what are called "decision problems." These are problems whose answer is YES or NO. (More formally they are all questions of given a string and a language, is the string in the language but that isn't an important distinction). In this sense, you are slightly incorrect in your understanding when you say "given the optimal solution of a difficult problem, it's easy to check the answer, but very difficult to find the solution" because decision problems don't have "optimal solutions." Problems where solutions can be "evaluated" and you are looking for the "best" solution are optimization problems, of which The Travelling Salesman Problem is an example. You can always turn an optimization problem into a decision problem by considering the problem "Given an instance of this optimization problem and an integer k, does the problem have a solution whose objective value is better than k?".

Another thing is you might be slightly confused as to what NP means. P is the class of decision problems that can be solved in Polynomial Time (that you seem to understand). NP stands for "Non-deterministic Polynomial Time" and it is the class of problems that you can easily check if an instance of the problem should give a YES answer given some extra information. So looking at our TSP problem, if I have an instance of TSP, and a solution whose total cost is less than k, then I can easily check that the solution is really a solution and that its cost is less than k. So the decision problem associated with TSP is in NP. But not all problems in NP are "hard". Actually P is a subset of NP because if you can easily solve the decision problem, you can easily check if an instance gives you a YES answer by just solving it.

But there are some problems in NP we think are hard to solve. Oversimplifying a little, we call these NP-complete problems. (Note these still must be decision problems). We can say a problem A is at least as hard as problem B if, we assume we have a blackbox oracle that solves problem A and we can use it to efficiently solve problem B. Let's again consider the TSP example. Clearly, if you could solve the optimization problem (that is get the optimal solution) then you could solve the decision problem. So the optimization problem is at least as hard as its corresponding decision problem. If we showed that the decision problem version of TSP was NP-complete (which it is) then we would know that the optimization problem TSP is also as hard as NP-complete problems, but it itself is not actually NP-complete because it isn't a decision problem. We call such problems NP-hard.

NaturalLogZ
  • 991
  • 5
  • 11
6

$P$ and $NP$ are classes of decision problems. The result of an algorithm for a decision problem is either "YES" or "NO". Even for a problem in $P$, such an answer cannot lead to a quick verification.

An instance of the decision problem version of TSP is "Given a collection of cities and intercity distances, is there a tour with total length less than $k$?", where $k$ is a constant specified in the instance. The result is "YES" or "NO". In neither case does the answer lead to a quick verification of the correctness of the answer.

The promise that you ask about is this: Given a particular proposed tour, one can in polynomial time:

  • Determine that the proposed tour actually is a tour -- visits all the cities and only traverses intercity routes that exist (sometimes "that have finite distances" when one encodes missing routes as having length $\infty$).
  • If so, determine that the length of the route is shorter than the constant $k$ in the problem instance.

Neither an answer of "YES" or "NO" provides a proposed tour.

The value of the model of $NP$ that you are using is that it encodes a way to make a solver: for each possible tour (typically an exponentially large set to iterate over) check to see if it is a tour and if its length is $< k$. If so, report "YES". If we exhaust the collection of possible tours without reporting "YES", report "NO".

Note that this model suggests that the the difficulty in fast solution is not that checking the conditions takes a lot of time. The difficulty in fast solution is that there are too many potential tours to search through. So, if we could find some really, really smart way to restrict our search to only a tiny subset the collection of potential tours, we would have a fast solution for an $NP$ problem.

Binary search in a sorted list is an example where one has a smart way to search through the list evaluating only logarithmically many (in the length of the list) comparisons rather than linearly many comparisons. From this point of view, the TSP problem is hard because we don't know a substantially faster way to search through the proposed tours of every possible TSP problem instance.

Eric Towers
  • 310
  • 1
  • 6
1

NP is all about decision problems - problems where the answer is "yes" or "no".

A problem is in NP if for every instance where the answer is "yes", there is a hint that let's you easily prove that the answer is "yes". It doesn't say anything about instances where the answer is "no". They can be hard to solve.

The classical Travelling Salesman problem is: Given a set of cities and their distances, is it possible to find a tour shorter than k? And quite obviously, if the answer is yes then such a tour exists, and we can use it as a hint to easily show the answer is yes. If the answer is no, then nobody has yet come up with any hint that would let you prove that.

You stated a problem that you also called "Travelling Salesman" problem, but it is actually different. You ask: Given a set of cities and their distances and a tour, is that tour the shortest tour? In this case, if the answer is "no" then there is a shorter tour, and we can use it as a hint to easily show the answer is "no". That's exactly the opposite of NP: Your alternative version of the Travelling Salesman problem is one where for every instance where the answer is "no", there is a hint that lets you easily prove the answer is "no". Because it is the exact opposite of NP, this class is called "co-NP".

There are many problems like that. For every problem in NP, you could ask the question: "Is the answer for this instance of the problem 'no'", and of course the answer is exactly the opposite of the original problem. You just made the mistake of thinking that every problem with the words "travelling" and "salesman" in it is the same problem.

gnasher729
  • 32,238
  • 36
  • 56
0

I find it most easy to understand by using the 3-SAT NP-complete problem:

There are $n$ boolean variables and you can decide for each of them either to be set the $true$ or $false$ value and you are given $k$ clauses. Each of the clauses contains 3 variables and the constraints to them, like $(true OR false OR true)$, so the clauese would be satisfied if the first variable was set to true OR the second variable to false OR the third variable to true. The $k$ klauses can contain all possible combinations of three of the $n$ variables and you have to decide what value every variable should be set to, so that all clauses are satisfied.

enter image description here

If you find a combination of values for all variables, so that every clause is satisfied, your combination can be vermied very easy by just going once throuegh every clause and test it, but it can be very hard to find a combination which satisfies every clause.

Eugen
  • 151
  • 7