As we all know, number of partitions plays an important role in Apache Spark RDD.
We may need to pre-calculate the number of partitions we are expecting after RDD operations.
We can change partition number using coalesce.
But what will happen when we do RDD Join operation ? Because we are joining two different RDDs, what will be the number of partitions of the result ?
The answer is,
We may need to pre-calculate the number of partitions we are expecting after RDD operations.
We can change partition number using coalesce.
But what will happen when we do RDD Join operation ? Because we are joining two different RDDs, what will be the number of partitions of the result ?
The answer is,
The number depends on `spark.sql.shuffle.partitions`. You can set it for customize it. The default value will be 200.
Property Name | Default | Meaning |
---|---|---|
spark.sql.shuffle.partitions | 200 | Configures the number of partitions to use when shuffling data for joins or aggregations. |
http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options |
Your writing style is captivating and your content is always engaging – keep it up!Navigating this minecraft factions servers
ReplyDeleteis a delight, offering a seamless and satisfying gaming experience for players of all levels.