102

I have been wondering about this question since I was an undergraduate student. It is a general question but I will elaborate with examples below.

I have seen a lot of algorithms - for example, for maximum flow problems, I know around 3 algorithms which can solve the problem: Ford-Fulkerson, Edmonds-Karp & Dinic, with Dinic having the best complexity.

For data structures - for example, heaps - there are binary heaps, binomial heaps & Fibonacci heaps, with Fibonacci heap having the best overall complexity.

What keeps me confusing is: are there any reasons why we need to know them all? Why not just learn and get familiar with the best complexity one?

I know it is the best if we know them all, I just want to know are there any "more valid" reasons, like some problems / algorithms can only be solved by using A but not B, etc.

shole
  • 1,210
  • 1
  • 10
  • 10

5 Answers5

129

There's a textbook waiting to be written at some point, with the working title Data Structures, Algorithms, and Tradeoffs. Almost every algorithm or data structure which you're likely to learn at the undergraduate level has some feature which makes it better for some applications than others.

Let's take sorting as an example, since everyone is familiar with the standard sort algorithms.

First off, complexity isn't the only concern. In practice, constant factors matter, which is why (say) quick sort tends to be used more than heap sort even though quick sort has terrible worst-case complexity.

Secondly, there's always the chance that you find yourself in a situation where you're programming under strange constraints. I once had to do quantile extraction from a modest-sized (1000 or so) collection of samples as fast as possible, but it was on a small microcontroller which had very little spare read-write memory, so that ruled out most $O(n \log n)$ sort algorithms. Shell sort was the best tradeoff, since it was sub-quadratic and didn't require additional memory.

In other cases, ideas from an algorithm or data structure might be applicable to a special-purpose problem. Bubble sort seems to be always slower than insertion sort on real hardware, but the idea of performing a bubble pass is sometimes exactly what you need.

Consider, for example, some kind of 3D visualisation or video game on a modern video card, where you'd like to draw objects in order from closest-to-the-camera to furthest-from-the-camera for performance reasons, but if you don't get the order exact, the hardware will take care of it. If you're moving around the 3D environment, the relative order of objects won't change very much between frames, so performing one bubble pass every frame might be a reasonable tradeoff. (The Source engine by Valve does this for particle effects.)

There's persistence, concurrency, cache locality, scalability onto a cluster/cloud, and a host of other possible reasons why one data structure or algorithm may be more appropriate than another even given the same computational complexity for the operations that you care about.

Having said that, that doesn't mean that you should memorise a bunch of algorithms and data structures just in case. Most of the battle is realising that there is a tradeoff to be exploited in the first place, and knowing where to look if you think there might be something appropriate.

Pseudonym
  • 24,523
  • 3
  • 48
  • 99
52

Aside from the fact that there are myriads of cost measures (running time, memory usage, cache misses, branch mispredictions, implementation complexity, feasibility of verification...) on myriads of machine models (TM, RAM, PRAM,...), average-vs-worst-case as well as amortization considerations to weigh against each other, there are often also functional differences beyond the scope of the basic textbook specification.

Some examples:

  • Mergesort is stable where Quicksort is not.
  • Binary search trees give you in-order iteration, hashtables do not.
  • Bellman-Ford can deal with negative edge weights, Dijkstra can not.

There are also didactic considerations to make:

  • How easy is it to understand a more involved solution before simpler ones? (AVL trees (and their analysis) without BSTs; Dinic without Ford-Fulkerson; ...)
  • Do you see the same principles and patterns when you are exposed to only one solution per problem compared to being exposed to many solutions?
  • Does exposition to only one solution per problem provide enough training (towards mastery)?
  • Should you know the breadth of which solutions have been found (so as to prevent you from reinventing the wheel over and over¹)?
  • When exposed to only one solution per problem, will you understand other solutions you find in the wild (say, in a real-world programming library)?

  1. This is something we see a lot from programmer types who do not have a rich CS toolbox at their disposal.
vonbrand
  • 14,204
  • 3
  • 42
  • 52
Raphael
  • 73,212
  • 30
  • 182
  • 400
7

In the real world, at some point you are likely to be working on software that has been written by a team of other people. Some of this software will have been written before you were born!

So as to understand the algorithms / data structures that are used, it is very helpful to know a large number of algorithms / data structures, including options that are no longer consider “state of the art”.

You will also have to work on algorithms that are not standard and are just used in the application you are working on. When you have to improve these algorithms, you will find that your brain has been filled with useful methods to improve algorithms, as you have studied how other people have improved algorithms.

This is what sets somebody who has studied computer science apart from someone that has just learned how to program. In most jobs I have worked in, there has been time when having studied computer science I could solve a problem that a “learned from books” programmer could not, but 95% of the time I found that having studied computer science gave me no advantage over other experienced programmers.

Ian Ringrose
  • 809
  • 6
  • 12
5

Many people have rightly mentioned that often there's no one best algorithm - it depends on the situation.

There's also the possibility that one day you'll come across an unfamiliar situation. The more algorithms you know, the more chance that you'll know one that's nearly a solution that you can use as a base.

2

A lot of great answers, just something I think is missing, though Raphael's answer somewhat mentions this.

Ease of implementation is also something to take into consideration.
That's usually not an issue with sort algorithms, because most platforms/languages already have one implemented (and often better than what you could do), but more unusual algorithms might not be available.
Depending on your problem, you might not need the absolute best algorithm if the implementation time is 1 day versus 2 weeks.

Leherenn
  • 161
  • 4