Debugging Apache Spark Jobs

Posted by & filed under Big Data.

Would you like to step through your Spark job in a debugger? These steps show you how to configure IntelliJ IDEA to allow just that. Unlike a traditional Java or Scala application, Spark jobs expect to be run within a larger Spark application, that gives access to SparkContext. Your application interacts with the environment through… Read more »

Apache Spark on EC2

Posted by & filed under Big Data.

Its easy to get started with Apache Spark. You can get a template for a Scala job using the Typesafe Activator¬†and have it running on a local cluster with a small dataset. You can also use a handy script spark_ec2 to launch an EC2 cluster as detailed in Running Spark on EC2 document. You could… Read more »