What's the difference between RDD's map,FlatMap and mapPartitions method?
map
converts each element of the source RDD into a single element of the result RDD by applying a function. mapPartitions
converts each partition of the source RDD into into multiple elements of the result (possibly none). FlatMap works on a single element (as map
) and produces multiple elements of the result (as mapPartitions)
.
RDD -> Map -> Single Element(one to one)
RDD -> MapPartition -> MultipleElements (single) for Each Partition
RDD -> FlatMap -> Multiple Elements(one to many)
These are the same in functioning
a.map(r => r * 2) a.mapPartitions(iter => iter.map(r => r *2))