Login | Register
My pages Projects Community openCollabNet

Sin: Continuous Integration Rethought
Casper Hornstrup

Continuous Integration

We have learned that Continuous Integration can help us catch bugs early. Continuous Integration is the concept of continuously building and testing software using an automated process. I'll begin by describing how most (if not all) existing Continuous Integration systems work and the problems with that approach. Then I'll explain what Sin does differently and what benefits the user get from doing it this way.

The usual approach

The usual approach to Continuous Integration is like this. There is a central Version Control System repository containing the source code. There is one or more machines, known as build machines, that regularly checks out copies of the latest source code from the repository, try to build them and run tests on the resulting binaries. The output from this is published onto a website and/or is sent to the developers that has checked in new changes that are included in the build using a mailing list, Instant Messenger or other communication channel. Working and non-working versions of the source code are usually marked with green (for working versions) and red (for non-working versions) colors on the website. There can be formulated strategies for resolving these integration problems that cause non-working source code to be put into the repository. One strategy could be that the developer who checked in the change that caused the source code to not work should commit a new change that fixes the problem (or remove the bad change). A second strategy could be that the first person that discovers the problem should fix it and notify all other developers that it was fixed.

So in short. In their most basic form, these systems gives us "blinking lights on a website" and a message to developers when the latest source code in the repository is not working correctly. The approach has several problems though. Common to all resolution strategies is that they all require human response to resolve the integration problems and bring the source code in the repository back into a working state. The resolution strategy must prevent more than one developer from fixing the same problem. If it doesn't then work is duplicated and time is wasted. If it does then some communication with other developers is required and thus they are interrupted in their work. The amount of time required to resolve the integration problems varies. There are several scenarios:

  • The developer don't know it was his/her change that caused the integration problem or several developers need to figure out which one of them should resolve the integration problem. Since not every change is validated independently of each other, all developers that checked in a new change included in the build will receive a mail from the Continuous Integration system. This means there is some communication overhead between these developers.

  • The developer don't notice he/she caused the source code in the repository to be in a non-working state until long after the change is committed.

  • Even if the developer noticed the integration problem as soon as possible, the source code in the repository would still be in a non-working state from the time of check in until the master build was completed, the developer was notified of the problem, and the developer fixed the problem in the repository.

Until the integration problem is resolved in the repository, there is a risk that other developers update their working copies, only to find their working copies to contain non-working source code. Again, developers are interrupted in their work since they now have to fix the problem themselves. Since human task switches are considered harmful, these interruptions are potentially big time-consumers.

How does Sin work?

Sin works a bit differently in some areas, but not in others. Sin introduces the concepts of checkin and stable branches. Usually, when you want to do parallel development, you create a branch separate from the main branch, thereby having two separate branches in the repository. Sin further subdivides each branch into a checkin and a stable branch, thereby making the total number of branches in the repository four. The layout of the repository could look like this:


Checkin branches are located inside the checkin directory and stable branches are located outside the checkin directory. Developers check in their changes to a checkin branch, but only the Sin system itself can modify stable branches. Most of the time, the contents of the branches in a particular checkin and stable branch pair is the same. However, from the time when a developer checks in a change to a checkin branch to the time when the Sin system has finished processing the change, the two branches will be different.

Sin can be seen as being made up of two parts, the Integration Manager and the pool of Integrators. The Integration Manger is responsible for controlling the integration process of the changes. It will notice new changes to checkin branches in the repository, ask available Integrators to validate the changes, and perform the needed merges within the repository in order to complete the integration of the changes. If a change could be verified then the change is merged from the checkin branch to the stable branch. If a change could not be verified then the change is reverted (undone) from the checkin branch. It will be almost like the developer had never committed the change (except that no information is ever lost in a version control system). If a change is reverted then the developer that committed the change is emailed information about what was wrong with with the change. The developer can then merge the change back into his working copy, fix the problem and commit it again. The Integrator will, when asked to validate a change, checkout the source code from the stable branch in the repository, apply the change that is to be integrated, validate the change, and report either success or failure back to the Integration Manager which then take an appropriate action.

So, we have the checkin branches which may or may not contain working source code and we have the stable branches which always contain working source code. To be guaranteed that their working copies contain working source code after an update with the latest changes to the repository, the developers must create their working copies from the stable branches. When it's time to commit a change, the working copy must be switched over to the checkin branch and then the change can be committed to the checkin branch. Since the checkin branch and the stable branch don't have many differences, switching the working copy between these two branches is usually a fast operation.

What are the benefits?

The benefits are many.

  • Less time is wasted when a change cause a (checkin) branch to be in a non-working state. Only the developer who committed the change will be notified that the change caused non-working source code in the repository. No other developer that use the repository will be bothered with the problem. Assuming that, if you double the number of developers, you double the number of integration problems, this will be of greater advantage to larger projects than to smaller projects.

  • It's easy to identify the problematic change. Since each change is validated independently of each other, it's very easy to identify the change that caused non-working source code to be in the repository and to identify whom is responsible for the change (and responsible for fixing the problem).

  • Heightened developer morale. By removing the broken build experiences, developers will have higher morale. This will also result in higher productivity.

Hardware requirements vs. level of protection

For large projects, a considerable amount of hardware may be required to have an acceptable performance of the Sin system. If you have the required hardware for it, I'd say build it all for every change. Building everything for every change gives you several benefits:

  • Developers will never again experience broken builds.

  • If the resulting binaries for each change are made accessible, you will save a lot of time when tracking down regressions. There is no need to retrieve several old versions of the source code and build them before they can be tested to find the exact change that caused the regression.

  • It will be even easier to track down regressions if big changes are rarely committed to the repository. It's much easier to locate a known bug in a 50 line change than in a 5000 line change. A 5000 line change is also much more likely to contain a bug than a 50 line change.

If acquiring adequate hardware is a problem, the source code can be split into smaller components and the Sin system can be set up to only verify the components which are modified by a given change. If this is done then there may be merged changes that cause the stable branches to be non-working and so developers may again experience broken builds, although there would likely be less of them. In essence, you trade protection for better performance of the Sin system with the same hardware. Another disadvantage of verifying only components that are modified by a particular change is the higher administrative overhead. The Sin system needs to know the boundaries of the different components and this is done by the developers setting Subversion metadata (properties) onto select directories which represents the roots of the components. A change that modifies a file or directory below a directory with a component property on it causes the Sin system to validate the component after the change is applied to it. Verifying on a component-by-component basis also add more requirements to the build system of the software. The build system must be able to receive a list of components and build and run tests for only those components and their dependencies.