Scala is statically typed. Yet we rarely use it to our advantage. For example let’s say you have a function with the signature:
def convert(a: Double, b: Double): String
Of course, the author hasn’t written a comment, and clearly didn’t expressively label the variables. So lets fix it:
def reverseGeocode(latitude: Double, longitude: Double): String
The types are the same, but the variable names now relate exactly what the function does. Most people stop there.
But its so easy to flip the order of the arguments, especially if they’re passed around in a long chain. What if we explicitly label the type using a type alias?
type Latitude = Double type Longitude = Double def reverseGeocode(point: (Latitude, Longitude)): String
This helps with documentation, but doesn’t let compiler validate we’re passing correct values – any tuple of doubles will be accepted as valid input. We can of course create a case class
case class GeoPoint(latitude: Double, longitude: Double) def reverseGeocode(point: GeoPoint): String
Works for this case, but not always, as the arguments may not fit neatly into a struct that can be given a name, and should be kept separate. We can wrap each value into a case class:
case class Latitude(value: Double) case class Longitude(value: Double) def reverseGeocode(lat: Latitude, long: Longitude): String
But now we’ve boxed the Double, which has a performance impact if we were to use it in a loop. Luckily Scala gives us a solution:
case class Latitude(value: Double) extends AnyVal case class Longitude(value: Double) extends AnyVal def reverseGeocode(lat: Latitude, long: Longitude): String
This is good, but does require another hop to get at the value, which could get verbose if widely used. Scalaz also offers Tagged Types, which help make the type more explicit and compile safe, so are a good alternative if you have the Scalaz dependency already in your project.
Too often in data science and analysis code I see primitive types. I don’t know if they’re left that way because of performance, or laziness (most likely the latter), but assigning an explicit type helps catch errors and provides implicit documentation better than expressive naming.