The SHA-1 hash checksum

Every time you take a snapshot of your project, Git uses SHA-1 cryptographic hash function to generate a 20-byte checksum. This hash value serves several purposes:

  • Uniqueness
    • Each commit and every piece of content is the repository is uniquely identified by its SHA-1 hash. Ensuring every change can be tracked and referenced distinctly.
  • Integrity
    • The hash provides a checksum of the content, which Git uses to detect corruption or tampering with the data. If even a single bit changes, the resulting hash will be entirely different.

Calculate the hash

We can use git hash-object command to calculate the SHA-1 hash of a file’s contents. This command takes the content of a file and output the SHA-1 hash, simulating what Git does internally.

git hash-object <file_name>

Hashing in Git objects

Git uses the SHA-1 hash to create a “git hash” for each one of the tree Git objects: the “blob” (file content), the “tree” (directory structure), and the “commit” (change set). This system forms a backbone for Git’s data model.

  • Blob (Binary Large Object)
    • Blob is a sequence of bytes. A blob in Git will contain the same exact data as a file, but without any metadata like file name, permissions, or directory information. It is just the bob is stored in the .git/objects/, and a file is stored in your computer’s filesystem.
    • Each blob is uniquely identified by a SHA-1 hash of its contents, referred to as the “Git hash”.
  • Git tree
    • A tree object in Git represents a directory. It contains a list of file names and their corresponding blob hash, as well as other trees (subdirectories).
    • This hierarchical organisation, or “git hash tree”, allows Git to efficiently manage and navigate the project’s directory structure.
    • The tree itself is also identified by a SHA-1 hash, derived from its contents.
  • Git commit
    • When you commit the changes, Git creates a commit object that references the tree’s hash.
    • Each commit object has a unique SHA-1 hash in a repository, these are the hashes we are used to seeing when we use git log command.

Git internal workflow

The diagram below, has a Git tree with a hash value 841B9,it includes a reference to a tree named DOCS with a hash value of CAFE7, and a blob named TEST.JS with a hash value of F00D1. The tree DOCS is further referencing other two blobs, in which the file F92A0 is a .png file.

The first snapshot

Now it’s time to take a snapshot of this Git file system, and store all the files that existed at that time, along with their contents. A commit can have one or more parent commits (the previous snapshot(s)). Commit objects are also identified by their SHA-1 hashes.

Make changes after a snapshot

After the snapshot, if we manage to change a file’s content, let’s say we edit the 1.txt file, and add an exclamation mark - that is, we changed the content from HELLO WORLD, to HELLO WORLD!. This change will result in a new blob with a new SHA-1 hash of its modified content 62E7A. Since we have a new hash, and the tree’s hash is depend on its content, the tree itself will also have new hash value 24601. Remember, the tree that points to the changed blob needs to change as well. And now, since the hash of that tree is different, we also need to change the parent tree 841B9, as it no longer points to the tree CAFE7. The parent tree now has a new hash value AA281.

A new commit

We are almost ready to create a new commit object, but it is not necessarily needed to store that much data. Some blob objects haven’t changed since the previous snapshot, the blob F92A0 and F00D1. As long as objects doesn’t change, we don’t store it again. In this case, we don’t need to store them once more, but refer to them by their hash value. We can then create a new commit object. Blobs that remained intact (haven’t changed since last snapshot) are referenced by their hash values. Since this commit is not the very first commit, it has a parent commit A1337.

Review the entire workflow

  1. Create blob object A blob is created when you perform a git add command upon files are moved from the working directory to the staging area (index). If the file has changed since the last commit, a new blob will be created for that same file with a different hash value; if the content is unchanged, Git does not create a new blob, instead, it references the existing blob.
  2. Tree creation Git creates a tree representing the directory containing the blob, and include the blob’s hash value.
  3. Create commit object When you commit the changes, Git creates a commit object that references the tree’s hash, the previous commit (if any) along with metadata like the commit message and author information.

What is a Git conflict

A Git conflict occurs when Git encounters two different versions of a file (i.e. two different blobs) that it cannot automatically reconcile. This situation can arise during operations like merging branches, switching branches, rebasing and some other operations, where Git needs to combine changes.

Example of a conflict

Imaging two branches with different changes to the same file.

Branch-A (file.txt)

Hello, World!

Branch-B (file.txt)

Hello, Git!

If you merge Branch-A into Branch-B, Git will detect a conflict because the line differ, and it cannot automatically resolve which version of the file is your desired.

Resolve a conflict

Below is a file with a conflict marker, it is generated automatically by Git when encountered a Git conflict.

<<<<<<< HEAD
Hello, World!
=======
Hello, Git!
>>>>>>> Branch-B

The content between <<<<<<< HEAD and ======= is the version of the file from your current branch (Branch-A). The content between ======= and >>>>>>> Branch-B is the version of the file from the branch being merged in (Branch-B). You can learn more about branches in Git Internals 3 - Git Branches, here we are focusing on resolving the Git conflict.

To resolve the conflict, you need to manually edit the file to choose which version to keep or by combining both changes in a way that makes sense. After resolving the conflict you remove the conflict markers, save the file, and stage the resolved file using git add command.


Back to parent page: Git

DevOpsGitGit_InternalsGit_ObjectGit_Conflict

Reference: