over 3 years ago

I thought I would share my experience trying out scalaz Task and then reverting to scala Future because of a perceived limitation. I now realize that was a mistake and would like to share my insight in order to help prevent others from falling into the same trap.

I first learnt about scalaz Task about a year ago when I was working on a project that required some special error handling. What got my attention was the ability to decouple concurrent execution from error handling. Scalaz defines both a Task and a Future. A Task simply adds error handling to the Future. This means you can add your own error handling on top of Future when required. That was exactly what we needed to do at the time. Although I could not have put words on it at the time, we built a Monad transformer in order to combine Future, Try and Validation into a single Monad.

To Cache or not to Cache

At first, this worked out great for us, that is until we ran into an issue where some of our Tasks were being run multiple times. Lets look at some sample code for the kinds of things we were trying to express:

val databaseCall: Task[Int] = ???

val serviceAResult = databaseCall.map(callServiceA(_))
val serviceBResult = databaseCall.map(callServiceB(_))

val resultTask = serviceAResult.zipWith(serviceBResult)(combine)

Now if you try this out with Task, you will notice that the database call will be run twice. That is because Task is immutable. However, if you were to rewrite this example with scala Future, the database access would only be performed once. Scala Futures follow a different programming model. A scala Future is generally “running” when you have a reference to one, and all references refer to the exact same object.

Computations that are being run concurrently are typically long running. That is because of the timed required to perform the computation must overweight the overhead of running it in on a separate Thread. One could then argue that caching the results of these computations is quite a desirable property. This is where we decided, after some reflection, to revert back to using scala Future at the time.

Think differently

It turns out, the appropriate way to use Task in order to solve this problem is to construct our resultTask in a different way. Consider the following:

val databaseCall: Task[Int] = ???

val resultTask = databaseCall.flatMap { data =>
    callServiceA(data).zipWith(callServiceB(data)(combine)
}

By constructing resultTask in such a way, databaseCall will only be executed once. I would even argue that for this specific example, the code is prettier this way. That being said, for a bigger example it might involve many nested maps and flatMaps which could become unwieldy. I actually believe that is probably the main reason why this solution did not occur to us at the time since our composite Tasks were quite large.

Referential transparency to the rescue

You might be thinking, okay, so both Task and Future can solve the above problem. Before coming to that conclusion, let’s consider some code that in some ways presents the opposite problem:

val readLine: Task[String] = ???
val writeLine(line: String): Task[Unit] = ???

val addUserNumbers = for {
      _ <- writeLine(“Please input a number”)
      first <- readLine.map(toInt)
      _ <- writeLine(“Please input a second number”)
    second <- readLine.map(toInt)
      _ <- writeLine(“Result is: ” first + second)
} yield _

In the above example, the program will ask the user for two numbers and print the result of adding those two numbers together. I think most people would expect the program to prompt the user twice and that is indeed what happens when using Task. The situation is a little more complicated if we were to rewrite this example in order to use scala Future. In that case, if readLine were a function that returns a new Future whenever we call it, then the program would behave the same way. However, if readLine is simply a reference to a Future, then the user will only be prompted once and first and second will refer to the same value.

So the moral of the story is that Task is much more “pure” when it comes to functional programming. To put a more precise term on it, Task is referentially transparent, you can replace a call to a function that returns a Task by the Task itself and the program behaves the same. In order to change the semantic of the program, one must explicitly modify the structure of the program, not just replace a function call with its value.

This does not just apply to Task, referential transparency is what makes a program purely functional and I believe you should strive to make your code as referentially transparent as possible (unfortunately, Scala, unlike Haskell, does not enforce referential transparency, but to be honest, that can sometimes be a good thing).

An interesting consequence of writing referentially transparent code is that we tend to end up writing code that describes our program instead of code that performs the program. You can then run your description in a different context then the one in which it was created, you can persist your program description and finally you can write multiple interpreters for the same description of a computation.

Conclusion

Hopefully I have managed to convinced you that Task is a useful abstraction and one that should be favored over scala Future. And if this wasn’t enough, I encourage you to read Runar’s blog post Easy performance wins with Scalaz for a perspective with more emphasis on performance.

 
over 3 years ago

You may or may not have heard that git rebase is evil. That's because git rebase rewrites history, typically throwing away knowledge that certain things happened in a certain way.

I will go through the main use cases today for using git rebase and will argue that a new hypothetical hierarchical version control system could alleviate the need to use rebase as much.

Rebasing to flatten many small commits

I love doing small commits! I strongly believe that each orthogonal change should be it’s own commit. It simplifies reverting bad changes. It also makes reviewing changes easier. However, I have noticed lately a trend to avoid really small commits. The reason for this is that it tends to dilute the history (log) of the project and can make it more difficult to get a high-level view of the changes the project has gone through.

One way to solve this is simply to do large commits. I prefer to continue doing small commits when developing on a feature branch and then use git rebase to flatten the small commits into larger ones before merging with master. This way I continue to get many of the advantages of small commits and avoid some of the drawbacks.

Rebasing to keep branches to a minimum

In addition to a plethora of small commits, an unmanaged git history with many collaborators will tend to have many small branches. To avoid this problem, one can choose to rebase instead of merging. The downside to doing this is information loss, specifically, the fact that the two sets of changes were not based on the same original version. Although developers should perform a merge or rebase with caution and re-run all tests, that’s not always what happens. Bad merges can be a source of bugs and the ability to identify them is lost with a rebase.

Rebasing to avoid looking stupid

Lastly, git rebase can be used to rewrite history and remove some stupid thing you did that you don’t want anyone to see because it would serve no purpose but to make you look bad. Now that’s a good use case!

A Better Solution - A HDVCS

Hopefully I have convinced you that although git rebase can be beneficial, it comes at a cost. Now the question is, is it possible to find alternatives to git rebase in order to solve the above scenarios in a better way? I believe it’s possible and the solution is a hierarchal VCS. A version control system that supports nested commits.

Many commits share a common theme or goal, a hierarchal VCS should allow us to group these smaller commits together to form larger higher level commits. Such a model for revision would offer us the best of both worlds, small commits and a high-level view of the project history. To dig deeper into the specifics of how a large feature came about, one can explore the arbitrarily nested hierarchy of commits that are part of the top level commit.

I also believe that this problem helps alleviate the merge problem. With nested commits, minor merges will tend to get hidden away inside of top-level commits whereas long-standing branches will surface at the top-level.

There will always be a need to erase history in order to get rid of pointless noise (especially when it makes us look better!), however, the current state of the art forces us to drop too much information in order to produce the clean, explorable history we are looking for.