Tuesday, September 9, 2014

What's the difference between RDD's map,FlatMap and mapPartitions method?

What's the difference between RDD's map,FlatMap and mapPartitions method?

map converts each element of the source RDD into a single element of the result RDD by applying a function. mapPartitions converts each partition of the source RDD into into multiple elements of the result (possibly none). FlatMap works on a single element (as map) and produces multiple elements of the result (as mapPartitions).
RDD -> Map -> Single Element(one to one)
RDD -> MapPartition -> MultipleElements (single) for Each Partition
RDD -> FlatMap -> Multiple Elements(one to many)

These are the same in functioning
a.map(r => r * 2)
a.mapPartitions(iter => iter.map(r => r *2))