over 3 years ago

I thought I would share my experience trying out scalaz Task and then reverting to scala Future because of a perceived limitation. I now realize that was a mistake and would like to share my insight in order to help prevent others from falling into the same trap.

I first learnt about scalaz Task about a year ago when I was working on a project that required some special error handling. What got my attention was the ability to decouple concurrent execution from error handling. Scalaz defines both a Task and a Future. A Task simply adds error handling to the Future. This means you can add your own error handling on top of Future when required. That was exactly what we needed to do at the time. Although I could not have put words on it at the time, we built a Monad transformer in order to combine Future, Try and Validation into a single Monad.

To Cache or not to Cache

At first, this worked out great for us, that is until we ran into an issue where some of our Tasks were being run multiple times. Lets look at some sample code for the kinds of things we were trying to express:

val databaseCall: Task[Int] = ???

val serviceAResult = databaseCall.map(callServiceA(_))
val serviceBResult = databaseCall.map(callServiceB(_))

val resultTask = serviceAResult.zipWith(serviceBResult)(combine)

Now if you try this out with Task, you will notice that the database call will be run twice. That is because Task is immutable. However, if you were to rewrite this example with scala Future, the database access would only be performed once. Scala Futures follow a different programming model. A scala Future is generally “running” when you have a reference to one, and all references refer to the exact same object.

Computations that are being run concurrently are typically long running. That is because of the timed required to perform the computation must overweight the overhead of running it in on a separate Thread. One could then argue that caching the results of these computations is quite a desirable property. This is where we decided, after some reflection, to revert back to using scala Future at the time.

Think differently

It turns out, the appropriate way to use Task in order to solve this problem is to construct our resultTask in a different way. Consider the following:

val databaseCall: Task[Int] = ???

val resultTask = databaseCall.flatMap { data =>
    callServiceA(data).zipWith(callServiceB(data)(combine)
}

By constructing resultTask in such a way, databaseCall will only be executed once. I would even argue that for this specific example, the code is prettier this way. That being said, for a bigger example it might involve many nested maps and flatMaps which could become unwieldy. I actually believe that is probably the main reason why this solution did not occur to us at the time since our composite Tasks were quite large.

Referential transparency to the rescue

You might be thinking, okay, so both Task and Future can solve the above problem. Before coming to that conclusion, let’s consider some code that in some ways presents the opposite problem:

val readLine: Task[String] = ???
val writeLine(line: String): Task[Unit] = ???

val addUserNumbers = for {
      _ <- writeLine(“Please input a number”)
      first <- readLine.map(toInt)
      _ <- writeLine(“Please input a second number”)
    second <- readLine.map(toInt)
      _ <- writeLine(“Result is: ” first + second)
} yield _

In the above example, the program will ask the user for two numbers and print the result of adding those two numbers together. I think most people would expect the program to prompt the user twice and that is indeed what happens when using Task. The situation is a little more complicated if we were to rewrite this example in order to use scala Future. In that case, if readLine were a function that returns a new Future whenever we call it, then the program would behave the same way. However, if readLine is simply a reference to a Future, then the user will only be prompted once and first and second will refer to the same value.

So the moral of the story is that Task is much more “pure” when it comes to functional programming. To put a more precise term on it, Task is referentially transparent, you can replace a call to a function that returns a Task by the Task itself and the program behaves the same. In order to change the semantic of the program, one must explicitly modify the structure of the program, not just replace a function call with its value.

This does not just apply to Task, referential transparency is what makes a program purely functional and I believe you should strive to make your code as referentially transparent as possible (unfortunately, Scala, unlike Haskell, does not enforce referential transparency, but to be honest, that can sometimes be a good thing).

An interesting consequence of writing referentially transparent code is that we tend to end up writing code that describes our program instead of code that performs the program. You can then run your description in a different context then the one in which it was created, you can persist your program description and finally you can write multiple interpreters for the same description of a computation.

Conclusion

Hopefully I have managed to convinced you that Task is a useful abstraction and one that should be favored over scala Future. And if this wasn’t enough, I encourage you to read Runar’s blog post Easy performance wins with Scalaz for a perspective with more emphasis on performance.

← A Hierarchal Version Control System
 
comments powered by Disqus