DILCISBoard.github.io

This document provides an introduction for those unfamiliar with Git and GitHub’s support for branching and versioning. It uses the development of the Common Specification for Information Packages to illustrate the points. We’ll start by looking at the current state of play with the CSIP.

GitHub tools and references

This is intended as a guide to GitHub, there are plenty of good online resources. GitHub’s official guides are the best place to start.

Git / GitHub Concepts

There are four Git concepts that need to be understood: commits, tags, branches / branching, and pull requests.

Commits

Commits are checked in units of work on a document. This is a simple model where by an author makes changes to a file or files in the repository and then checks them into the repository. This check in is a commit and it records:

Here’s an illustration of a small commit to the CSIP repository, it simply changes the version number and publication date:

CSIP commit

The original can be found at https://github.com/DILCISBoard/E-ARK-CSIP/commit/afaededeabb82d1b6afb0e10154e6ac9c3518a60 the long last part of the URL is the SHA1 id of the commit itself. Work is built up as a chain of commits. Ideally an individual commit should be small as it makes tracking changes and rolling back work easier. The name commit is derived from the act of committing a change to the permanent record by checking the work into a repository.

Tags

A tag is effectively a bookmark to an individual commit and is used to record a significant state in the project repository. Released versions are given a tag and typical tag names tend to reflect this, e.g. v2.0 for the CSIP. Tags come in two flavours, lightweight tags are also used by authors to bookmark particular states of work. Official release tags tend to be annotated tags store more information, e.g. author, comment, dates etc. to give confidence to users that the state is an approved version. The v2.0-draft of the CSIP is shown in the figure below.

CSIP v2.0-draft tag

Branches and versioning

Git branches are a way of organizing different strands of work within a repository. It’s the use of branches that allow multiple authors to work on a single document simultaneously. In effect branches are nothing more than mobile tags that record a particular state of work. To illustrate consider the current work on the CSIP which might be organized into the following branches:

Each repository also contains specific release branches named rel/<version-no>. These are used to:

Pull Requests

Pull requests are GitHub’s mechanism for sharing and organising work done in other branches. Once a pull request is open its contents can be reviewed by other project members and follow-on commits can be added if necessary. A pull request can be initiated by comparing any two branches for changes. Once a pull request is made it usually, depending on the repository policy, requires a review before it can be merged. The figure below shows a PR made against the CSIP repository that awaits review.

Pull request for E-ARK CSIP

GitHub provides a pull request view for each project. Here’s the open pull requests for the Common Specification project: https://github.com/DILCISBoard/E-ARK-CSIP/pulls. This view shows pull requests that have been closed: https://github.com/DILCISBoard/E-ARK-CSIP/pulls?q=is%3Apr+is%3Aclosed.

The CSIP and GitHub Workflow

There’s been a considerable amount to produce a draft of version 2 of the common specification. This has involved both the revision of the text and the transfer of the content from a Word document to a plain text source hosted on GitHub. The motivation behind the move to GitHub was to make change and version control simpler going forward. The Word model of track changes and comments had become overwhelmed with multiple sets of comments and suggested changes proving difficult to untangle. Git, and GitHub by extension, provides a more granular and nuances approach to managing changes to text based documents. These have traditionally been source code but it’s also used to control supporting documentation. While Git can be used to version binary files it’s at it’s best when working with plain text formats, e.g. ASCII, Unicode, etc. where it provides a set of supporting tools for analysing and approving changes. One or two structured text formats have been developed to support documentation on source control systems. Markdown has been one of the most popular and successful. It allows basic HTML like formatting to be provided using plain text constructs, indeed this document is written in Markdown. Other document forms, e.g. HTML or PDF can easily be generated from a Markdown source, while the source document itself can be transparently accessed and managed via GitHub. This is the reason that the CSIP was converted to Markdown which is then converted to HTML for the CSIP website, but is also used to generate the PDF document.

Our current state of play is that we have:

Bearing this in mind, we’ll look at how this will be handled under Git and GitHub management.

GitHub workflow

In the diagram it can be seen that integration (in purple) and master (in green) are the two consistent branches. Master always shows the latest official release and is updated from release branches, not from integration.

Following the yellow boxes for work on draft releases it can be seen that the draft release rel/v2.0-draft is created from the integration branch. The official draft release is also pushed to master. Revisions to the draft are made in the fix/v2-draft which is merged with the rel/v2.0-draft branch to create the corrected rel/v2.0 branch. In reality this work would take place in several branches as separate strands of work. The diagram excludes these for clarity.

Once version 2.0 is ready in rel/v2.0 it can be merged to master and to integration as the content also needs to be in the working version. At the same time work can continue in the red feat/segmented_ips branch. The author must merge the latest work from integration here also so that other work, e.g. typos fixed in v2.0 are retained. Once work has finished on the segmented IP branch it can be merged to master for publication in a future specification version, e.g. v2.1.