I thought I would share my experience trying out scalaz
Task and then reverting to scala
Future because of a perceived limitation. I now realize that was a mistake and would like to share my insight in order to help prevent others from falling into the same trap.
I first learnt about scalaz
Task about a year ago when I was working on a project that required some special error handling. What got my attention was the ability to decouple concurrent execution from error handling. Scalaz defines both a
Task and a
Task simply adds error handling to the
Future. This means you can add your own error handling on top of
Future when required. That was exactly what we needed to do at the time. Although I could not have put words on it at the time, we built a Monad transformer in order to combine
Validation into a single
To Cache or not to Cache
At first, this worked out great for us, that is until we ran into an issue where some of our
Tasks were being run multiple times. Lets look at some sample code for the kinds of things we were trying to express:
Now if you try this out with
Task, you will notice that the database call will be run twice. That is because
Task is immutable. However, if you were to rewrite this example with scala
Future, the database access would only be performed once. Scala Futures follow a different programming model. A scala Future is generally “running” when you have a reference to one, and all references refer to the exact same object.
Computations that are being run concurrently are typically long running. That is because of the timed required to perform the computation must overweight the overhead of running it in on a separate Thread. One could then argue that caching the results of these computations is quite a desirable property. This is where we decided, after some reflection, to revert back to using scala
Future at the time.
It turns out, the appropriate way to use
Task in order to solve this problem is to construct our resultTask in a different way. Consider the following:
resultTask in such a way,
databaseCall will only be executed once. I would even argue that for this specific example, the code is prettier this way. That being said, for a bigger example it might involve many nested maps and flatMaps which could become unwieldy. I actually believe that is probably the main reason why this solution did not occur to us at the time since our composite
Tasks were quite large.
Referential transparency to the rescue
You might be thinking, okay, so both
Future can solve the above problem. Before coming to that conclusion, let’s consider some code that in some ways presents the opposite problem:
In the above example, the program will ask the user for two numbers and print the result of adding those two numbers together. I think most people would expect the program to prompt the user twice and that is indeed what happens when using
Task. The situation is a little more complicated if we were to rewrite this example in order to use scala
Future. In that case, if
readLine were a function that returns a new
Future whenever we call it, then the program would behave the same way. However, if
readLine is simply a reference to a Future, then the user will only be prompted once and
second will refer to the same value.
So the moral of the story is that
Task is much more “pure” when it comes to functional programming. To put a more precise term on it,
Task is referentially transparent, you can replace a call to a function that returns a
Task by the
Task itself and the program behaves the same. In order to change the semantic of the program, one must explicitly modify the structure of the program, not just replace a function call with its value.
This does not just apply to
Task, referential transparency is what makes a program purely functional and I believe you should strive to make your code as referentially transparent as possible (unfortunately, Scala, unlike Haskell, does not enforce referential transparency, but to be honest, that can sometimes be a good thing).
An interesting consequence of writing referentially transparent code is that we tend to end up writing code that describes our program instead of code that performs the program. You can then run your description in a different context then the one in which it was created, you can persist your program description and finally you can write multiple interpreters for the same description of a computation.
Hopefully I have managed to convinced you that
Task is a useful abstraction and one that should be favored over scala
Future. And if this wasn’t enough, I encourage you to read Runar’s blog post Easy performance wins with Scalaz for a perspective with more emphasis on performance.
You may or may not have heard that
git rebase is evil. That's because
git rebase rewrites history, typically throwing away knowledge that certain things happened in a certain way.
I will go through the main use cases today for using
git rebase and will argue that a new hypothetical hierarchical version control system could alleviate the need to use rebase as much.
Rebasing to flatten many small commits
I love doing small commits! I strongly believe that each orthogonal change should be it’s own commit. It simplifies reverting bad changes. It also makes reviewing changes easier. However, I have noticed lately a trend to avoid really small commits. The reason for this is that it tends to dilute the history (log) of the project and can make it more difficult to get a high-level view of the changes the project has gone through.
One way to solve this is simply to do large commits. I prefer to continue doing small commits when developing on a feature branch and then use
git rebase to flatten the small commits into larger ones before merging with
master. This way I continue to get many of the advantages of small commits and avoid some of the drawbacks.
Rebasing to keep branches to a minimum
In addition to a plethora of small commits, an unmanaged git history with many collaborators will tend to have many small branches. To avoid this problem, one can choose to rebase instead of merging. The downside to doing this is information loss, specifically, the fact that the two sets of changes were not based on the same original version. Although developers should perform a
rebase with caution and re-run all tests, that’s not always what happens. Bad merges can be a source of bugs and the ability to identify them is lost with a rebase.
Rebasing to avoid looking stupid
git rebase can be used to rewrite history and remove some stupid thing you did that you don’t want anyone to see because it would serve no purpose but to make you look bad. Now that’s a good use case!
A Better Solution - A HDVCS
Hopefully I have convinced you that although
git rebase can be beneficial, it comes at a cost. Now the question is, is it possible to find alternatives to
git rebase in order to solve the above scenarios in a better way? I believe it’s possible and the solution is a hierarchal VCS. A version control system that supports nested commits.
Many commits share a common theme or goal, a hierarchal VCS should allow us to group these smaller commits together to form larger higher level commits. Such a model for revision would offer us the best of both worlds, small commits and a high-level view of the project history. To dig deeper into the specifics of how a large feature came about, one can explore the arbitrarily nested hierarchy of commits that are part of the top level commit.
I also believe that this problem helps alleviate the merge problem. With nested commits, minor merges will tend to get hidden away inside of top-level commits whereas long-standing branches will surface at the top-level.
There will always be a need to erase history in order to get rid of pointless noise (especially when it makes us look better!), however, the current state of the art forces us to drop too much information in order to produce the clean, explorable history we are looking for.