73

A few years ago, MapReduce was hailed as revolution of distributed programming. There have also been critics but by and large there was an enthusiastic hype. It even got patented! [1]

The name is reminiscent of map and reduce in functional programming, but when I read (Wikipedia)

Map step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.

Reduce step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.

or [2]

Internals of MAP: [...] MAP splits up the input value into words. [...] MAP is meant to associate each given key/value pair of the input with potentially many intermediate key/value pairs.

Internals of REDUCE: [...] [REDUCE] performs imperative aggregation (say, reduction): take many values, and reduce them to a single value.

I can not help but think: this is divide & conquer (in the sense of Mergesort), plain and simple! So, is there (conceptual) novelty in MapReduce somewhere, or is it just a new implementation of old ideas useful in certain scenarios?


  1. US Patent 7,650,331: "System and method for efficient large-scale data processing " (2010)
  2. Google’s MapReduce programming model — Revisited by R. Lämmel (2007)
Raphael
  • 73,212
  • 30
  • 182
  • 400

4 Answers4

49

I can not help but think: this is divide & conquer, plain and simple!

M/R is not divide & conquer. It does not involve the repeated application of an algorithm to a smaller subset of the previous input. It's a pipeline (a function specified as a composition of simpler functions) where pipeline stages are alternating map and reduce operations. Different stages can perform different operations.


So, is there (conceptual) novelty in MapReduce somewhere, or is it just a new implementation of old ideas useful in certain scenarios?

MapReduce does not break new ground in the theory of computation -- it does not show a new way of decomposing a problem into simpler operations. It does show that particular simpler operations are practical for a particular class of problem.


The MapReduce paper's contribution was

  1. evaluating a pipeline of two well understood orthogonal operators that can be distributed efficiently and fault-tolerantly on a particular problem: creating a text index of large corpus
  2. benchmarking map-reduce on that problem to show how much data is transferred between nodes and how latency differences in stages affect overall latency
  3. showing how to make the system fault tolerant so machine failures during computation can be compensated for automatically
  4. identifying specific useful implementation choices and optimizations

Some of the critiques fall into these classes:

  1. "Map/reduce does not break new ground in theory of computation." True. The original paper's contribution was that these well-understood operators with a specific set of optimizations had been successfully used to solve real problems more easily and fault-tolerantly than one-off solutions.
  2. "This distributed computation doesn't easily decompose into map & reduce operations". Fair enough, but many do.
  3. "A pipeline of n map/reduce stages require latency proportional to the number of reduce steps of the pipeline before any results are produced." Probably true. The reduce operator does have to receive all its input before it can produce a complete output.
  4. "Map/reduce is overkill for this use-case." Maybe. When engineers find a shiny new hammer, they tend to go looking for anything that looks like a nail. That doesn't mean that the hammer isn't a well-made tool for a certain niche.
  5. "Map/reduce is a poor replacement for a relational DB." True. If a relational DB scales to your data-set then wonderful for you -- you have options.
Mike Samuel
  • 1,204
  • 10
  • 16
22

EDIT (March 2014) I should say that I have since worked more on algorithms for MapReduce-type models of computation, and I feel like I was being overly negative. The Divide-Compress-Conquer technique I talk about below is surprisingly versatile, and can be the basis of algorithms which I think are non-trivial and interesting.


Let me offer an answer that will be much inferior to Mike's in terms of comprehensiveness, but from a model of computation/algorithmic theory standpoint.

Why there is excitement: MapReduce interleaves parallel and sequential computation; each processor has access to a nontrivial chunk (e.g. $O(n^\epsilon)$) of the input and can perform a nontrivial operation on it; that is very much unlike PRAM models and seems like an interesting idea that might lead to new algorithmic techniques. In particular, some problems can be solved in few (constant in input size) rounds of computation, while no nontrivial problem can be solved in PRAM in $o(\log n)$ time.

Why the model is getting slightly frustrating for me: The only algorithmic technique that seems to work to get $O(1)$ rounds algorithms and is somewhat new is the following

  • Partition the problem instance (often randomly)
  • Do some computation on each partition in parallel and represent the result of the computation compactly
  • Combine all the compactly represented subproblem solutions on a single processor and finish the computation there

Very simple example of the technique: compute the sum of $n$ numbers. Each processor has $O(\sqrt{n})$ of the array, and computes the sum of that portion. Then all the $\sqrt{n}$ sums can be combined on a single processor to compute the total sum. A slightly more interesting exercise is to compute all prefix sums this way (of course in that case the output has to be represented in a distributed way). Or compute a spanning tree of a dense graph.

Now, I think this actually is an interesting twist on divide and conquer, the twist being that after the divide stage you need to compress subproblem solutions so that a single processor can conquer. However, this really seems to be the only technique we've come up with so far. It fails on problems with sparse graphs, like sparse connectivity for example. Contrast this with the streaming model, which led to a wealth of new ideas, like the ingenuous sampling algorithm of Flajolet and Martin, the deterministic pairing algorithm of Misra and Gries, the power of simple sketching techniques, etc.

As a programming paradigm, map reduce has been very successful. My comments regard map reduce as an interesting model of computation. Good theoretical models are a little bit odd. If they follow reality too closely they are unwieldy, but more importantly, (to borrow a term from machine learning) theorems proved for models which are too specific do not generalize, i.e. do not hold in other models. That's why we want to abstract away as much detail as possible, while still leaving enough to challenge us to come up with novel algorithms. Finally, those new ideas should be able to eventually find their way back to the real world. PRAM is one unrealistic model which led to interesting ideas but those ideas proved to be rarely applicable to real world parallel computation. On the other hand, streaming is also unrealistic, but it inspired algorithmic ideas which are actually employed in the real world. See count-min sketch. Sketching techniques are in fact also used in systems based on map reduce.

Sasho Nikolov
  • 2,587
  • 17
  • 20
6

I fully agree with you. From a conceptual perspective, there is nothing really new: Map/Reduce was originally known in Parallel Computing as a data-flow programming model. However, from a practical point of view, Map/Reduce as proposed by Google and with the subsequent open-source implementations has also fueled Cloud Computing and is now quite popular for very simple parallel decompositions and processing. Of course, it is not well suited for anything else requiring complex domain or functional decompositions.

Massimo Cafaro
  • 4,360
  • 19
  • 27
3

I think you've hit the nail on the head with your comment.

It's not true that in any functional language maps can be parallelized - the language must be pure. (I believe Haskell is the only vaguely mainstream purely functional language. Lisp, OCaml and Scala are all non-pure.)

We've known about the benefits of pure code since even before timesharing, when engineers first pipelined their processors. So how come no one uses a pure language?

It's really, really, really hard. Programming in a pure language often feels like programming with both hands tied behind your back.

What MR does is relax the purity constraint somewhat, and provide a framework for other pieces (like the shuffle phase) making it quite easy to write distributable code for a large fraction of problems.

I think you were hoping for an answer like "It proves this important sub-lemma of $NC=P$" and I don't think it does anything of the sort. What it does do is show that a class of problems known to be distributable are "easily" distributable - whether that's a "revolution" in your opinion probably depends on how much time you've spent debugging distributed code in a pre-Map/Reduce world.

Xodarap
  • 1,538
  • 1
  • 10
  • 17