DevsLogics: Spark Join and number of partitions

Friday, February 26, 2016

Spark Join and number of partitions

As we all know, number of partitions plays an important role in Apache Spark RDD.
We may need to pre-calculate the number of partitions we are expecting after RDD operations.
We can change partition number using coalesce.

But what will happen when we do RDD Join operation ? Because we are joining two different RDDs, what will be the number of partitions of the result ?

The answer is,

The number depends on `spark.sql.shuffle.partitions`. You can set it for customize it. The default value will be 200.

Property Name	Default	Meaning

`spark.sql.shuffle.partitions`	200	Configures the number of partitions to use when shuffling data for joins or aggregations.

http://spark.apache.org/docs/latest/sql-programming-guide.html#other-configuration-options

3 comments:

RichardFebruary 6, 2024 at 3:52 AM
Your writing style is captivating and your content is always engaging – keep it up!Navigating this minecraft factions servers
is a delight, offering a seamless and satisfying gaming experience for players of all levels.
ReplyDelete
Replies
Jeanetta C GriffinAugust 16, 2024 at 4:12 AM
Your writing style is both engaging and informative. Thank you for making complex topics accessible and interesting. Find out how fast you can respond by taking our Reaction Time Test. It’s a simple and effective way to measure your speed and work on getting faster.
ReplyDelete
Replies
Jeanetta C GriffinOctober 7, 2024 at 3:35 AM
Thank you for consistently offering inspiration. Your blog helps me stay committed to achieving my personal and professional goals. Discover additional insights in this article. Jump into Geometry Dash Free and tackle fast-paced levels filled with vibrant obstacles and catchy soundtracks.
ReplyDelete
Replies

Add comment