Issue | Workaround |
---|---|
Driver OOM while using reduceByKey | decrease spark.default.parallelism |
Java killed by OOM killer on a slave node | change formula in /root/spark-ec2/deploy_templates.py on master node:spark_mb = system_ram_mb * 4 // 5 |
SPARK-4808 Spark fails to spill with small number of large objects | update Spark to 1.4.1 |
SPARK-5077 Map output statuses can still exceed spark.akka.frameSize | set("spark.akka.frameSize", "128") |
SPARK-6246 spark-ec2 can't handle clusters with > 100 nodes | apply the patch for deploy script |
SPARK-6497 Class is not registered: scala.reflect.ManifestFactory$$anon$9 | don’t force Kryo class registration |
HADOOP-6254 s3n fails with SocketTimeoutException | use S3A file system in Hadoop 2.7.1 |
S3 HEAD request failed for ... - ResponseCode=403, ResponseMessage=Forbidden | the same |
gzip input files are not splittable. (We need to save storage space by archiving datasets. The default choice gzip format doesn’t allow random read access. This is not a bug, but it greatly increases the probability of failing with other issues and degrades performance.) | use bzip2 compression for input files |
HADOOP-7823 port HADOOP-4012 to branch-1 (splitting support for bzip2) | update to Hadoop 2.7.1 |
HADOOP-10614 CBZip2InputStream is not threadsafe | the same |
HADOOP-10400 Incorporate new S3A FileSystem implementation | the same |
HADOOP-11571 Über-jira: S3a stabilisation phase I | the same |
SPARK-6668 repeated asking to remove non-existent executor | install Hadoop 2.7.1 native libraries |
SPARK-5348 s3a:// protocol and hadoop-aws dependency | build Spark with patch |
Stack Overflow - How to access s3a:// files from Apache Spark? | --conf spark.hadoop.fs.s3a.access.key=... --conf spark.hadoop.fs.s3a.secret.key=... |
HADOOP-9565 Add a Blobstore interface to add to blobstore FileSystems | use DirectOutputCommitter (see here) |
Timeout waiting for connection from pool | conf.setInt("fs.s3a.connection.maximum", 100) |
Courtecy : http://tech.grammarly.com/blog/posts/Petabyte-Scale-Text-Processing-with-Spark.html#sthash.GsFi4n0G.dpuf