Source Control – Branching and Merging

Summary

In this blog post I’m going to talk about branching and merging your software in a multi-team development environment.

The Community Branch

A common mistake in branching is assuming that fewer branches are better.  The logic behind this assumption revolves around the idea that fewer branches translate to fewer merges.  Technically this is true, but each merge is very large and complex.  A larger number of smaller branches are easier to manage than a small number of larger branches and I’m going to show why this is true.

But first, I’m going to describe what I call the “community branch.”  The community branch is a branch that contains many different projects, shared by a large group of programmers.  The idea is that the branch is just an abstract place to put code (like a development branch) and that change sets can be moved around as though they are independent of each other.  

Here’s a simple example diagram of such a setup:


I want to stop and mention that I’m trying to explain a very complex “issue” here.  So I’m going to make this explanation as simple as possible and ignore any QA or stage testing process.  I’ll just assume that the process goes from development to deployment and demonstrate problems that occur when projects are mixed on one branch.

In the above diagram I have a main branch which is the branch that contains the latest software deployed to the production system.  The first operation (point B) is a branch for the start of development work (C).  Two teams of developers are working on the software.  The first team completes some of their work and checks it into the development branch at point D.  Project team 2 begins work after this point (for simplicity of this example) and they then check their software in at point E.  Project team 2 then merge their version into the main branch and the software is deployed.  Project team 1 is still working on their software and they continue to check in changes (point G).

Now for the problem in this scheme.  First, points D, E, and G are change sets.  If the change sets are all merged back to the main branch in the same order that they are checked into the development branch, then everything will work correctly.  Also, if at any point the team who merges to main performs a merge of all change sets from their point to the beginning, everything will work out (assuming all software is ready to go).  

However, if project 1 requires a refactoring of common code that project 2 is dependent on, then there is a problem.  If change set E is merged, but relied on changes made at change set D, then the merge to main will be broken (even though it worked on the development branch).  The only way to fix the merge to main is to manually make changes on the main branch to compensate for the modified common code that was made in the development branch.  The entire change set D cannot be merged with main at this point because project 1 is a partially completed project.

Every community branch will experience this issue if the branch is allowed to live long enough.  It’s unavoidable.  The problem will get worse as more changes are made to the branch that are not deployed.  The reason code might not be deployed can be due to long development time projects, projects that get placed on the back-burner or projects that get cancelled.


Project Branches

The correct method of branching is to isolate projects on their own branches.  Each team of programmers work with their own branch.  If a deployment to main is made from another project (or bug fixes, hot fixes, etc.), then the main branch is re-merged with the local project branch.  Each programming team must assign an owner to their project branch who must ensure that the branch is always up to date.  Once the project (or the sprint) is complete, then the branch can be merged into main.  

Here’s an example:

In this example project team 1 forms their own branch, which we will refer to as the project 1 branch.  Project team two form their own branch, called the project 2 branch.  Each branch is always formed from the main branch because that is the latest deployed software.  When project team 2 complete their sprint or project, they are authorized to merge with main.  The merge is made and since there are no other changes on main, they should get a clean merge.  Then the software is deployed to the production system.  After deployment, each branch must be re-merged from main.  In this example, project team 1 will merge changes from main back to their own branch.  If there are any common object changes, then they will need to be updated on the project 1 branch.  In this instance, project 1 will be ready to merge clean into the main branch after point G.  Any common objects changed by project 1 do not need to show up in the software created by project 2 until deployed to main by project 1.  It is also project team 1 who is responsible for the update to all software that is touched by the refactoring of common software.  Later down the line, project team 1 will merge with main and the merge will be clean.

Each time software is merged with main, all branches must re-merge from the main to obtain the latest changes that were deployed.  

Using a QA Branch

OK, so let’s show an example using a QA branch.  First, the QA branch is created from main and lives forever.  This is a special branch, and no work is to be performed directly on this branch.  The QA branch should only contain changes merged from other branches.  Once QA testing has been completed, then the results of the QA branch can be merged with main and deployed.  At the point of deployment, the QA branch should equal the main branch (i.e. a software source code comparison should be identical, minus any config files specific to the QA environment).  

Here’s an example:

In this example, all the rules from the previous example still apply.  When a deployment occurs, all live branches must re-merge from main (not from QA).  In the diagram above, project 1 starts first and the branch is created from main (C and D).  Project 2 starts after and it is also created from main (E and F).  Then each project in this example are completed and selected for the next deployment cycle (G and I).  Both branches are merged with QA (H and J).  Then an bug is detected during QA and project team 2 must fix this bug.  They must fix the bug in their branch and then merge their latest change sets down to QA for re-testing (K and L).  Once QA has been completed and the software is ready for release, it can be merged with main and deployed (M and N).  

One other note: When a project is completed and deployed, then the branch for that project should be closed.  If any bugs are detected after this point, it should be treated as a bug and not part of the original project.  The reason for closing the branch is to reduce the maintenance required.  There is no reason to keep maintaining branches that belong to completed projects.

Other Branches

I’m sure by now you can visualize adding a permanent branch called “stage” that would perform a similar function to the QA branch.  In such a setup, the stage branch would be the destination of the QA branch upon completion of quality checking.  Alternatively, merging to stage could occur right after software is merged into QA and continuously merged from QA to stage as updates are made.  That would provide the ability to test in a QA environment and a stage environment in parallel.  The completed software would be merged from stage down to main for final deployment.

One wrench in the works are bugs.  Bug fixing is usually performed in a short period of time.  Many software shops will create a bug branch, fix bugs and merge these into QA or directly to main.  Bugs would need to be bundled together to prevent too many changes to main which would trigger re-merges with project branches.  It’s recommended to perform bug fixes and merge into the QA branch at the time when projects are first merged.  Then QA can be performed on the bugs, the projects and merged toward main at one time.  Once everything is merged in main and deployed, then all changes can be re-merged back to any open project branches.

Hot fixes or emergency bugs also need to be accounted for.  Hot fixes can also have a special branch that lives forever.  This branch would probably bypass the QA branch since a hot fix is usually something that must go out right away.  Once a hot fix is deployed, all branches, including QA should re-merge the changes from main.

Potential Issues

One branching issue occurs when a database change must be performed.  A branch that requires a change in a database must somehow merge those changes into the QA environment database and then down to the main database(s) (in production) at the right time.  I’m not going to cover this issue in this blog post because it applies to the community branch just as well as project branching scheme.

Automated deployments must account for constantly changing branches.  The automated deployment system should be setup to allow any branch to be deployed to any environment.  The most efficient setup would involve virtual development environments that can be cloned from your production environment with nearly identical configurations.  This will reduce the amount of time it takes developers to fix problems related to differences in environments.  It also increases the success rate of the final deployment since configuration variables are removed at development, QA and staging time.  I’m not going to go into automated deployment systems in this article but hard-coding an environment to a branch is bad practice and should be avoided.

 

 

 

Leave a Reply