4 Git concepts and architecture

4.1 The Three Trees

  • Two-tree architecutre (other VCS):

    • Repository and working copy are the two trees.

    • Directories and files can be thought of as trees

    • “checkout” copies from repository to working directory

    • make changes, and “commit” those changes back to the repository

    • two tree architecture
  • Git: Three-tree architecutre:

    • Repository, Staging index, Working trees

    • three tree architecture
    • Working directory: contains changes that may not be tracked

    • Staging index: changes that we are about to commit

    • Repository: actually being tracked

4.2 Git Workflow

Workflow for three-tree architecture.

  • Suppose we create file.txt in our working directory (call these changes A).

  • git add file.txt stages A in the staging index.

  • git commit to push A to the repository.

  • Suppose you make changes to file.txt, call these changes B.

  • Again use add and commit to stage and push B respectively.

  • Now the repository has two sets of changes in it, A and B. This is the typical workflow to make changes to a repository. Use git log to view these changes which is referenced by git using a unique number.

4.3 Hash Values (SHA-1)

  • Previously, we refered to the changes as A, B, C. These changes can be on a single file, or several files in a directory, or across directories.

  • Git generates a checksum for each change set.

  • Checksum algorithms convert data into a simple number.

    • Same data always equals same checksum.

    • Data integrity is fundamental.

    • Changing data changes checksum.

  • Git uses SHA-1 hash algorithm to create checksums. “What’s the SHA value of that commit?”

  • 40-character hexadecimal string. = f(all the data, all the changes). f is one-to-one.

  • f is also a function of (parent (SHA value of previous snapshot), author, commit message)

    • Thus not just the change set, but data integrity of the history of change sets is built in.

4.4 HEAD pointer

  • Pointer to tip of current branch in repository

  • Last state of repository, what was last checked out

  • Points to parent of next commit where writing commits takes place

  • By default, the branch we’re working on is called master. We start with our first commit. At the start, the HEAD pointer points to that commit.

    • three tree architecture
  • When a new commit is made, a new SHA is created, and git moves the HEAD pointer to this new SHA value.

    • three tree architecture
  • When another is made, it does the same and moves the head pointer again.

    • three tree architecture
  • If a new branch is created, the HEAD moves to commits on that branch.

    • three tree architecture



  • cat .git/HEAD shows where head pointer contents are located

  • cat .git/refs/heads/master contained the SHA where HEAD is pointing to.