*Map* - Split a problem up and distribute to workers *Reduce* - Gather and aggre...

city41 · on Sept 30, 2010

I think the article is just fine. People learn things differently. For some, a "real world" example can help drive things home.

ergo98 · on Sept 30, 2010

I suppose, but there are better ways people can spend their time. This article is rife with misspellings and typos, which essentially shows a disrespect for readers. Secondly it's 1300+ words split into 6 "chapters" to say, poorly, what can be said in two sentences. I would not recommend it, especially on a site like HN (which purports to lean towards a higher standard).

In fact as a more general rule I would say that when people try to explain something, it is lazy, almost always confusing -- and usually an indication of the writer's own ignorance -- when said explanation resorts to contrived scenarios or analogies.

It's like if the UPS shipper had to deliver an elephant and twelve penguins, one of which suffered from gastroenteritis. By driving the truck using the hybrid energy recovery system, just how much conflict-gas that was fueled by the death of thousands of soldiers would a Catholic priest yield?

raphaelb · on Sept 30, 2010

This makes sense to me than the article did and I had no prior understanding of it, though some people might also like real world examples.

I feel like seeing the essential (as you've done) and then finding examples that are specific implementations is how I learn best.

nadam · on Sept 30, 2010

Not only is the algorithm described in the article complex for the simple task, but it is also incorrect. It has a single bottleneck: the grouper. It is interesting that I've been downmodded for providing a simpler but correct algorithm. (I don't know too much about MapReduce (almost nothing), but I like designing distributed algorithms for fun.)

jbooth · on Sept 30, 2010

He did a decent job of explaining how the different parts of the process relate to each other when it comes to the real world situation of parallelizing MR. The shuffle (grouper) actually happens per reducer in many implementations using hash-bucketed outputs from the mappers, but that's an optimization.