In this article I'll run through some of the do's and don'ts of Git that I've found in working in many teams as either a developer, lead or manager

What is a Version Control System (VCS)?

Git is a Version Control System (VCS), it is not a file sharing service, nor is it a social network. It should not be confused with GitHub (an online developer platform) or GitFlow (a workflow or branching strategy) or any of the other tools or services that have sprung up around Git.

Git is a distributed VCS, but before we look at what makes it distributed, let's look at 2 other types.

Local VCS

Take any document and call it 'version 1', now copy it, make changes, and rename it 'version 2', copy that and make changes then call it 'version 3'. At any time you can open version 1, version 2, or version 3 and continue from that point. If you continue with this system and find you have made an error on version 15, you can go back to version 14 and remove version 15.

This is a version control system, basic, but you have versions you can go back to any time.

Now, if you take a folder of documents, like a website, app or any project and you rename the whole folder with the version, instead of just 1 document. You could make changes to multiple documents for any version and can always go back to older versions if you make a mistake or don't like what’s changed.

A folder system in this way seems to be a perfectly good version control system, so what are the downsides? First off, this is a huge waste of space. If you don't make any changes to a document, you still make an exact copy of it, so a duplication issue. Also, we cannot 'see' the differences in the versions without looking through all the files to see what has changed in each version, so a visibility issue.

To fix the duplication and visibility issues we could log only the changes instead of copying the whole folder for each version. For example, we change line 15 in 'Document A' so we log

The line number of the change
What the line has been changed from
What line has been changed to
The file name
The time
Some sort of unique identifier (a hash of all of the above is a good ID)

This is how VCSs work, they log the changes and do not duplicate files (unless the file is binary, these are usually duplicated) and this is a local VCS. That's fine for local and 1 person, but what if you want to share the project with another person or a team? We could share the VCS by centralising it.

Centralised VCS

For a Local VCS we have removed the duplication and visibility issues with a system where we log the changes only (our first log entry is adding the original files), but this is local to one person, let’s look at sharing a project.

We move the project to a 'server' and change that to a master copy. We can then allow each person to 'checkout' a file and make changes. No other person can checkout a document while it's checked-out by a person. Once changes have been made, the person will 'check-in' the document and the system will update the changes centrally. Anyone else can now checkout that document. This is actually how old VCSs like SVN or Perforce worked, but as you can imagine this is painful when someone has multiple files checked out and you need a file.

For example, you need file A, but another person has it. The other person needs file B, but you have it. You cannot check-in file B without your changes to file A as that will break the system and they cannot check-in file A without changes to file B because that would also break the system. Deadlock!

To fix deadlock either person must backout their changes and check-in the file so the other person can check it out. The first person will wait for the changes and the first persons check-in, then the other person can complete their changes and check-in. Chances are, while this is happening another person has checked out file C and both needed that too. When you are trying to complete a ticket, you need multiple files so this happens all too often.

The second issue with centralised VCSs is with people ‘going away’ who have checked out files meaning these files are locked. On a project using Perforce, in a 2-week period this happened to me twice, so it is more common than you might think. The only way to 'fix' this is for someone with privileges to 'unlock' the file putting it back to it's original state. This will almost always result in the loss of work because to the person unlocking the file has no knowledge of the work, they are usually an IT person, not a developer.

A centralised VCS is useful because it allows people to share a project without duplication and allowing easy visibility of changes, but the 2 'locking' issues makes this type of VCS painful. Enter the Decentralised VCS and Git.

Decentralised VCS

To fix the locking issue we need to allow anyone to checkout any file at any time and not lock any file. To do this we first allow anyone (with visibility) to have access to the full repository and every change ever. When we pull down a Git project or repository for that project, we have all the changes from the first ever made to the last change made and we have a repository like all others. There is no Central repository because all repositories are the same. When we make changes, we 'commit' the changes to the repository we have locally, and when we 'push' we upload all of our changes to the repository we have dubbed remote and merge our changes in. There can be multiple ‘remotes’ because all repositories are the same, it’s just a matter of access.

NB. Some people think of this remote repository as central; this is a mistake and can cause problems when 1 remote repository has an issue. There can be worries because some people don't realise, they can simply recreate from any other repository, they are all the same.

Now, no file is ever locked so the lock and deadlock issues have been solved, but if there is a change to the same file at the same line this causes a new issue, the dreaded merge conflict! To solve this, the 2 people that made the changes should look at the line each person is trying to merge in work out manually how they merge together best and add that to the repo. Unfortunately, this is not always the case and lazy or uninformed people turn to '--force' and 'rebase' to 'fix' the merge conflict.

Merge Conflict

Most teams work with several repositories for developers working locally and 1 repository on Github or other online VCS service that the developers use as the remote (and source of truth).

The developers create feature branches from tickets, and they change the files a needed, committing correctly as they go.

At some point 2 developers have made changes to the same line in the same file unknown to each other. They finish their tickets and the first creates a request to merge their work into the repository (a Pull Request or PR). This is tested and merged in, and all is fine up to this point.

The second developer does the same as the first and finds the merge conflict. The change log in Git says the line they are changing from (remember what is logged by the VCS) does not match what is in the repository so the VCS, Git, knows there is a merge conflict, i.e. 2 developers have changed the same code.

To solve this, the 2 people that made the changes should look at the line each person is trying to merge in work out manually how they merge together best, they should make a 3^rd change that brings everything together a merge that in.

Unfortunately, some developers don’t fix the merge conflict for whatever reason (usually uninformed) and turn to '--force' and 'rebase' to 'fix' the merge conflict.

Force and Rebase

When using Git, a team should never use 'git push --force' or 'git rebase', using a push --force will override any merges with your changes, using rebase will hide the fact the changes ever existed.

I worked in a team some years ago where a junior thought they knew more about Git than they did. They used force and rebase and let other juniors know this was a good way to get around merge conflicts. In 2 weeks there were no merge conflicts in a busy development department (4 teams on 1 frontend project) and the manager started looking into why. He found what had happened. Parts of the system failed tests that had passed for years, 4 developers lost work, 1 losing 2 weeks of work and walked out. It was a mess.

These tools can be useful when used in the correct circumstance, like an urgent bug fix to production, but within a team force and rebase should not be used.

Teams using Git

That brings us to how we should use Git with a team. I'm not going to tell you the IDE to use or which terminal is the best, that is up to the developer doing the work, but when it comes to the best way to use Git there is really only one branching strategy that works for all developers, all teams and all situations and that is the agreed branching strategy for the repository, whatever strategy is that has been agreed.

Agreed Branching strategy

There are many workflows from Trunk-based to GitFlow, and each developer has their own favourite that they would like to use, but the only thing that matters is everyone on the same repository use the same workflow. If one member of the team will not use the workflow chosen, they must not be allowed to work on the repository. It does not matter if they are a 'rockstar' developer or a junior or a senior with 20 years experience, once the repository has a workflow, all people working on that repository must work on it with the same workflow. This workflow must be documented and should be added to as issues occur and are overcame. The document for this should really be in the repository, but some businesses have specific place for this type of document so it should be written in 1 place and linked from others to prevent duplication.

Trunk Strategies

This is strategy is very simple, all developers simply commit to the master or trunk branch, and versions are pushed out whenever a milestone is reached. Unfortunately, this causes problems.

In this post we are talking about a Version Control System, which is a way to store our projects safely. If we do not care about storing safely, we can store the code and not use a VCS, but if we do care about safely storing code, then 'Trunk' strategies are not the ones to use, they are unsafe. Let me explain. A developer should start coding and when the system is in working state, commit. They should also commit at the end of the day, when they have done something difficult, before a break or at any time they feel they have completed something. Now, if a developer does this to the trunk branch, then the trunk may not be in a working state so cannot be released to live. If a developer does commit a partial, untested code, we do not know when they will ‘fix’ it.

If there are tests or Peer Reviews in place to stop merges that are not working, then the trunk branch is safe, but environments have been spun up, Reviewers and testers may have been notified to check code and this wastes time and money (unless you are a cloud provider or hosing company, then it has made you money). So, this leads to developers to only commit when their code is 'done' not commit 'small and often', which is wrong. Trunk based strategy can be fast if there is a small or 1 person team and the ‘unsafeness’ can be forgiven for speed. Personally, I tend to stay away from trunk strategies for all my projects and go with ‘safer’ strategies so I can scale teams quickly as needed.

GitFlow branching strategy

GitFlow was written almost 15 years ago and has a tool that goes with it. The person who wrote it (Vincent Driessen) says it's out of date, but with a few changes it is still perfect for today. You see, nothing has really changed in that time in VCSs. Yes, the tool is not useful, but the strategy is still great for busy teams. Here are the basics:

Create your repository with master branch
Any changes to the master branch should be versioned and made live
Create a develop branch
Only the develop branch or hotfixes can merge into master
Every ticket should be a feature branch
Feature branches merge into develop

That's it. Anyone that picks up a ticket creates a branch from develop with a ticket number and a description like 216_Make-all-buttons-blue. They commit all code changes to that branch while they work on that ticket. All commits start with the ticket number then describe the change like 216: Changed the button css to blue. Once the ticket is code complete, a Pull Request (PR) is made asking for the changes to be merged into the develop branch. At this point peer reviews and automated and manual tests are completed against the code and if it fails, the code is fixed and tested again. Once this has passed the tests the code is merged into develop and the process restarts.

Once the Project manager (PM) / Product Owner (PO) is happy with all the features in the develop branch a Pull Request (PR) is made from develop to master and regression testing can take place then merged, versioned and put live.

Feature switches

The only issue with we have left is the Product Owner (PO) / Project manager (PM) / Client might not want all the functionally that is in the a branch putting live, only some of it. For example, a new hero carousel that shows the Halloween products and an upsell block have been created. PM/PO want the upsell, but not the carousel live.

As things are we cannot put the code live because both features will go live.

This is an issue with any strategy, all code is released that is ready, but it does not mean all ‘ready’ code should be seen immediately. This is where feature switches come in.

We add code around each functionality and have a central place to switch these on. This can be as simple as environment variables or as complex as a new system, but it should be outside the version control and should be simple to switch on and off. If done correctly this can be used to do A/B testing or a paid service.

AWS have AppConfig which can be used for this.

SemVer

When we go live, we add a version number to our code that tells everyone from developers to clients to stakeholders that there have been changes. Version 1, Version 2, etc., but how do we know if there has been a breaking change or not? How do we know if there is a new feature or whatever? This is all listed in the change log that we can publish (with modification to make it readable), but we can make things even more simple by using correct version numbers.

We use 3 numbers with dots between each like ‘2.34.3’.

In SemVer the number above is major version 2, which may not be compatible with version 1, minor version 34, which is compatible with all major minor versions of major version 2 and 3 patches also compatible with all minor versions and major version 2. So:

MAJOR version when you make incompatible API changes
MINOR version when you add functionality in a backward compatible manner
PATCH version when you make backward compatible bug fixes

SemVer is for use with APIs, but all apps, websites, games and other software can take advantage of the system, most software do today.

Summary

This is a much longer article than I expected to write, but I touched on all the major points I wanted to present:

Types of VCSs
Merge conflicts and resolving them
Force and Rebase hell
Branching strategies
Feature switches
SemVer

If I’ve missed anything or messed up let me know but do let me know if this is of use or interesting.

Using Git Version Control in a Team & Branching Strategies

Dave Slack | Tuesday, November 26, 2024