Apache Spark recently received top level support on Amazon Elastic MapReduce (EMR) cloud offering, joining applications such as Hadoop, Hive, Pig, HBase, Presto, and Impala. This is exciting for me, because most of my workloads run on EMR, and utilizing Spark required either standing up manual EC2 clusters, or using EMR bootstrap, which was very… Read more »
Posts Tagged: apache spark
Debugging Apache Spark Jobs
Would you like to step through your Spark job in a debugger? These steps show you how to configure IntelliJ IDEA to allow just that. Unlike a traditional Java or Scala application, Spark jobs expect to be run within a larger Spark application, that gives access to SparkContext. Your application interacts with the environment through… Read more »
Apache Spark on EC2
Its easy to get started with Apache Spark. You can get a template for a Scala job using the Typesafe Activator and have it running on a local cluster with a small dataset. You can also use a handy script spark_ec2 to launch an EC2 cluster as detailed in Running Spark on EC2 document. You could… Read more »