Handling Avro records in Scalding

Posted by & filed under Big Data, Programming.

In this post I’ll try to cover how to write and read Avro records in Scalding pipelines. To begin, a reminder that Avro is a serialization format, and Scalding is a scala API on top of Hadoop. If you’re not using Scalding, this post is probably not too interesting for you. Let’s begin by defining… Read more »

DataPhilly: Apache Pig

Posted by & filed under Programming.

I’m giving a presentation on Apache Pig at DataPhilly. For those who want to access the slides later, or those who were not able to make it in person, below is the presentation I’m using to keep my talk on track.