MapReduce provides only map and reduce primitives. Rest of the things (groupby, sort, top, join, filter etc) have to be force fit into the map and reduce primitives. But, Spark provides a lot of other primitives as methods on RDD (1, 2, 3). So, it’s easy to write programs in Spark than using MapReduce. Also, Spark programs are a bit faster because Spark keeps the data in the memory and spills it to the disk when required between each iteration. In case of MapReduce, the data has to be written to the disk between iterations which make it slow.
The common denominator to MapReduce and Spark are the map and the reduce primitives. But, there are some differences between the map and the reduce primitives in MapReduce and Spark. For those familiar with the MapReduce model and would like to move to Spark, here is an nice blog entry from Sean Owen (Cloudera). As usual reading the article is something to start with, but the actual practice is one which will make things more clear.