Thursday, October 21, 2010

Switching to a new version control sytem is not easy

Today I read this LWN article about the pains of PostgreSQL switch from CVS to Git. It reminded me the lessons learned during a similar endeavor which happened about a year before the PostgreSQL move.

When HelenOS was about to switch to Bazaar in August 2009, its Subversion repository contained only some 4686 revisions, but our previous, rather unsuccessful, attempts to convert the repository to Mercurial made us start with a brand new repository rather than a converted one.

The fact is that we lost the continuity of our revision history, but on the other hand we did not end up with a twisted, possibly damaged, or even missing history. For our reference, we still keep the old Subversion repository around and we also have the Mercurial convert, which documents the steps needed to turn a medium-size repository of a centralized version control system with a handful of branches into a repository of a distributed version control system.

The conversion process using hg convert was very slow, tedious and error prone. hg convert is not to be blamed, however, because the problems stem from the fact that there is no apparent 1:1 mapping between Subversion and Mercurial repository entities. A couple of times, I had to stop in the middle of the conversion, create synthetic history (i.e. commits that never really happened) to emulate various repository events and then resume the conversion at the next revision. The events that were confusing the conversion software were:
  • changes of the repository namespace, for example when we switched from the kernel/trunk naming scheme to trunk/kernel,
  • starting a new branch, and
  • merging an existing branch into trunk.
In fact, the process was so tedious, that I decided to only fake history for the first type of events and threw the rest over the board. The result is that the Mercurial convert repository does not contain revision history for any but the main branch and it does contain some synthetic commits that mimic the various main branch renames and shuffling that happened in the past.

The convert is still useful though, because it at least serves as a backup for our old trunk history, but there is no doubt that some information was lost in translation.

The LWN article also touches another interesting point: it takes some serious mind-quake to stop thinking in the centralized repository way and accept the distributed nature of the new system. PostgreSQL, perhaps temporarily, decided to use the repository in a rather centralized manner and does not permit merges. The same is true for the OpenSolaris Mercurial repository, where the history is artificially kept linear. In our case, we wanted to go distributed from day one, but our initial clumsiness in distributed thinking lead to several merges that unfortunately swapped the left-hand side branch (i.e. mainline) with the right-hand side branch, back and forth. We learned from our mistakes though, documented the process and developed a Bazaar plug-in which prevents us or anyone else from repeating the same error again.

Erare humanum est, but there are also lessons to be learned from either the LWN article or, hopefully, this blog entry.