The major difference between Git and other Version Control Systems (VCS) is the way Git thinks about its data. Conceptually, most other VCS store information as a list of file-based changes. These VCS think of the data they store as a set of files and the changes made to each file over time (this is commonly described as delta-based version control). In the diagram above, you can discover the storing data as changes to a base version of each file. Git does not think of its data in this way. Instead, Git creates a series of snapshots.

Snapshot

Git is all about saving snapshots of your projects and then working with and comparing those snapshots. A snapshot is a state of something (e.g. a text file, a folder) at a specific point of time. To be more efficient, if files have not changed, Git does not store the file again, just linked to the previous identical file it has already stored. Git thinks about its data more like a stream of snapshots. In Git, creating a commit is taking a snapshot of your project. Every time you commit, Git creates a unique reference (commit hash) to that specific commit or snapshot. The actual content of a commit, including the commit message, author, timestamp, and the tree, is stored as an object in the Git directory, its object subdirectory, or the object database .git/objects/, the hashes are stored in the .git/refs directory. Each branch has a file in the .git/refs/heads/ directory, and this file contains the commit hash of the latest commit on that branch. (You will find out more in the Git directory)

Git hashes

Git commit structure is designed around hashing and trees. Leveraging SHA-1 hash function and a hierarchical structure to manage and store project history efficiently. You can study more about Git hashes in Git Internals 2 - Git Objects.

The three states

Git has three main states that files can reside in:

  • Modified
    • You have changed the file but have not committed it to your database yet.
  • Staged ^d09fab
    • You have marked a modified file in its current version for inclusion in the next commit.
  • Committed
    • The data is safely stored in your local database.

This leads to the three main sections of a Git project: the working directory (tree), the staging area, and the Git directory.

  • Working directory (tree) ^d2ce52
    • The working tree is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use and modify. You can edit your files here, and changes you make to files are made here.
  • Staging area (index)
    • The staging area is a file contained in your Git directory (.git/index), that stores information about what will go into your next commit. The staging area allows you to choose which changes you want to commit.
  • Git directory
    • Where Git stores the metadata and object database for your project. It is what copied when you clone a repository from another computer or database.

A basic Git workflow goes something like this:

  1. You modify files in your working directory.
  2. You select changes that you want them to be in your next commit, which adds only those changes to the staging area.
git add <filename>
  1. You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently into your Git directory.
git commit -m "A commit msg"

The Git directory

The Git directory is located at the root of your Git repository. It is located and hidden in your Git project file, named .git.

Key contents of the Git directory

  • HEAD
    • This file points to the current branch reference, telling Git what the current commit is.
  • config
    • This file contains the repository’s configuration settings, such as user information and repository specific settings.
  • index
    • The staging area, this file is where changes are stored before they are committed. It contains a sorted list of path names, each with permissions and the SHA-1 of a blob object.
  • objects/
    • This directory contains all the objects (e.g. commits) that represent the content of the repository.
  • refs/
    • This directory contains all the references to commit objects, which are used to manage branches, tags and other references.
    • It contains sub-directories like heads/ for branches and remotes/ for remote branches.
  • hooks/
    • This directory contains scripts that are executed by Git in response to specific events like committing and merging.

Status of your files

Each file in your working directory (tree) can be in one of two states: tracked or untracked.

Tracked files

Tracked files are files that were in the last snapshot (commit), as well as any newly staged files. When you initialise a Git repository or clone one, all the files in the repository are tracked.

Three states of tracked files

Any tracked file can be in either unmodified, modified, or staged state.

  • Unmodified
    • The file hasn’t changed since its last snapshot.
  • Modified
    • The file has been altered since its last snapshot, but hasn’t been staged for the next commit.
  • Staged
    • The file has been modified and added to the staging area, ready to be committed in the next snapshot.

Untracked files

Untracked files are everything else, any files in your working directory that were not in your last snapshot and are not in your staging area. Untracked are typically new files that have not been added to the staging area; if a file is listed in the .gitignore file, Git will also treat it as untracked.

A clean working tree

A clean working tree means that your working directory is in sync with the last snapshot on the current branch. In the other word, there are no changes that haven’t been committed, and there are no untracked files.

Good to be clean

A clean working tree is often a good state to be in before performing actions like merging branches and pulling updates from a remote repository. This ensures your changes are committed and tracked, preventing potential loss of working progress.

Characteristics of a clean working tree

  • No unstaged changes
    • All files in your working directory match the latest snapshot.
  • No staged changes
    • There are no files in the staging area waiting to be committed.
  • No untracked files
    • There are no new files in your working directory that Git isn’t already tracking (unless files are ignored via .gitignore).

Checking the status of your files

You can determine which files are in which state using the git status command.

git status

If you run this command right after a clone, you should see the below message, which indicates you are having a clean working tree.

On branch main
Your branch is up to date with 'origin/main'.
 
nothing to commit, working tree clean

If you added a new file to your working directory, let’s say, a simple README.md file. If the file does not exist before, and you run git status command, you will see the untracked status.

On branch main
Your branch is up-to-date with 'origin/main'.
Untracked files:
  (use "git add <file>..." to include in what will be committed)
 
    README
 
nothing added to commit but untracked files present (use "git add" to track)

Recall that an untracked state basically means that Git sees a file you didn’t have in the previous snapshot (commit), and which hasn’t yet been staged; Git won’t start including it in your commit snapshots until you explicitly tell it to do so (by performing git add to move it to the staging area) . It does this so you don’t accidentally begin including binary files that generated by your build tool or other files that you didn’t mean to include.

There are more scenario based cases you can explore about the git status command in Record Changes to the Repository.

Short status

Git also has a short status flag (-s or --short) that you can see your file status in a more compact way.

git status -s
git status --short

This is a sample output to the console.

$ git status -s
 M README
MM Rakefile
A  lib/git.rb
M  lib/simplegit.rb
?? LICENSE.txt
  • M: Modified
  • MM: Modified and staged, and then modified again
  • A: New file added to the staging area
  • ?: Untracked file

Back to paren page: Git

DevOps VCSGitGit_Internals

Reference: