Posted by & filed under Big Data, Programming.

Scala is statically typed. Yet we rarely use it to our advantage. For example let’s say you have a function with the signature:

 def convert(a: Double, b: Double): String

Of course, the author hasn’t written a comment, and clearly didn’t expressively label the variables. So lets fix it:

 def reverseGeocode(latitude: Double, longitude: Double): String

The types are the same, but the variable names now relate exactly what the function does. Most people stop there.
But its so easy to flip the order of the arguments, especially if they’re passed around in a long chain. What if we explicitly label the type using a type alias?

 type Latitude = Double
 type Longitude = Double
 def reverseGeocode(point: (Latitude, Longitude)): String

This helps with documentation, but doesn’t let compiler validate we’re passing correct values – any tuple of doubles will be accepted as valid input. We can of course create a case class

 case class GeoPoint(latitude: Double, longitude: Double)
 def reverseGeocode(point: GeoPoint): String

Works for this case, but not always, as the arguments may not fit neatly into a struct that can be given a name, and should be kept separate. We can wrap each value into a case class:

 case class Latitude(value: Double)
 case class Longitude(value: Double)
 def reverseGeocode(lat: Latitude, long: Longitude): String

But now we’ve boxed the Double, which has a performance impact if we were to use it in a loop. Luckily Scala gives us a solution:

 case class Latitude(value: Double) extends AnyVal
 case class Longitude(value: Double) extends AnyVal
 def reverseGeocode(lat: Latitude, long: Longitude): String

This is good, but does require another hop to get at the value, which could get verbose if widely used. Scalaz also offers Tagged Types, which help make the type more explicit and compile safe, so are a good alternative if you have the Scalaz dependency already in your project.

Too often in data science and analysis code I see primitive types. I don’t know if they’re left that way because of performance, or laziness (most likely the latter), but assigning an explicit type helps catch errors and provides implicit documentation better than expressive naming.

Leave a Reply

  • (will not be published)