In this post I’ll try to cover how to write and read Avro records in Scalding pipelines. To begin, a reminder that Avro is a serialization format, and Scalding is a scala API on top of Hadoop. If you’re not using Scalding, this post is probably not too interesting for you. Let’s begin by defining… Read more »
Posts Tagged: hadoop
DataPhilly: Apache Pig
I’m giving a presentation on Apache Pig at DataPhilly. For those who want to access the slides later, or those who were not able to make it in person, below is the presentation I’m using to keep my talk on track.