Version Control with Git 4. Basic Git Concepts Study cards
Attribution: Version Control with Git, 2nd Edition[Book] (oreilly.com)

A Git repository is a database containing all of the information needed to retain and manage the revisions and history of a project. Within a repository, Git manages two primary data structures: the object store and the index. All of this repository data is stored at the root of your project in the .git directory.

The object store is designed to be efficiently copied during a clone. The index is private to a repository.

Object store


There are four types of objects in the object store: blobs, trees, commits, and tags. These are the four atomic data types that form all of Git's higher level data structures.

Each version of a file is represented by a blob (which is a contraction of binary large object). Blob is a common term in computing that refers to a variable or file that can contain any data and whose internal structure is ignored by the program. A blob holds a file's data but not any metadata about it such as its name.

A tree object represents one level of directory information. It records blob identifiers, path names, and metadata for all of the files in one directory. It can recursively reference other tree objects.

A commit object holds metadata for each change introduced into the repository like author, commit date, and log message. Each commit points to a tree object that captures one complete snapshot of the repository at the time the commit was performed. The initial commit, or root commit, has no parent. Most commits have one commit parent, but it is possible for a commit to have more than one parent.

A tag object assigns an arbitrary name to an object, usually a commit.

To use disk space and network bandwidth efficiently, Git also compresses and stores the objects in pack files which are also in the object store.

Index


The index is a temporary and dynamic binary file that describes the directory structure of the entire repository at one moment in time. Git commands allow you to stage changes in the index and also plays an important role in merges.

Content


Every object in the object store has a unique name which is generated from applying SHA1 to the contents of the object. SHA1 values are 160 bit values and are usually represented by 40 digit hexadecimal numbers. Sometimes the SHA1 is called a hash code or object ID. Since any tiny change to a file causes the SHA1 hash to change, the SHA1 hash is an effective globally unique identifier.

Git tracks content; the object store is based on hashes of the contents of its objects, not on their file names or paths. 

Pack files


Git uses pack files as an efficient storage mechanism. With this, Git computes differences between files and stores those rather than complete versions of every file that are similar.

How the objects fit together


The blobs are at the bottom of the data structure. They do not reference anything and are referenced by tree objects. Tree objects point to blobs and possibly other trees. Any given tree can be pointed to by many commits (this is because it is possible commits made at different times and by different contributors resulted in the same content in the repo, that would be represented by the same tree with the same SHA1 value). A commit points to a tree that is introduced by the commit. 
Where is all of the Git repository data stored for a project (in the filesystem)?
The term blob is a contraction of what?
What are the two main data structures that Git stores in the .git directory at the root of your project?
Of the two primary data structures managed by Git, which is designed to be efficiently copied during a clone?
Of the two primary data structures managed by Git, which is private to a repository?
What are the four types of objects in the Git object store?
What are the four atomic data types that form all of Git's higher level data structures?
Git represents every version of each file as what type of object?
What is a common term in computing that refers to a variable or file that can contain any data and whose internal structure is ignored?
What does a Git blob represent?
Does a Git blob representing a file have the file's name?
What does a Git tree object represent?
What does a Git commit point to that represents the snapshot of the repository at the time the commit was performed?
What does a Git commit represent?
What does a Git tag represent?
What Git object represents one level of directory information?
Since a Git tree object represents a directory which can contain files and directories, what two types of Git objects does a Git tree reference?
What Git object holds metadata for each change introduced into the repository like author, commit date, and log message?
Each Git commit points to what type of object that captures a complete snapshot of the repo at that commit?
What is the Git commit that has no parent?
Can a Git commit have more than one parent?
What is the Git object type that assigns an arbitrary name to another Git object?
What type of files does Git use to make use of disk space efficiently, that involve computing diffs between similar files instead of storing every version of every file?
What is the temporary and dynamic binary file used by Git that describes the entire structure of the repo, is where changes are staged, and plays an important role in merges?
What are the two primary data structures managed by Git within a Git repository?
How many bits is a SHA1 value?
SHA1 values are usually represented by hexadecimal numbers with how many digits?
Why is the SHA1 hash used by Git essentially a globally unique identifier?
What data type is at the bottom of the Git data structure and its objects do not reference anything?
What Git data types do tree objects point to?
How is it that the same Git tree can by pointed to by different commits?
Version Control with Git 5. File Management and the Index Study cards
The index can be thought of as the set of intended modifications. A commit is a two-step process: the changes are staged, and then committed.

It's All About the Index


Linus Torvalds argued on the Git mailing list that Git cannot be appreciated without understanding the purpose of the index. The state of the index can be queried at any time using git status.

File Classifications in Git


Git classifies your files into three groups: tracked, ignored, and untracked.

Tracked files are those already in the repository on staged in the index.

Ignored files are explicitly declared invisible even though they are in the working directory.

Untracked files are files not in the previous two categories.

To make Git ignore a file within a directory, simply put that file's name to the file .gitignore. Even though .gitignore is a special file to Git, it is managed by Git like any other file and needs to be staged and committed like any other.

A file that is untracked is converted to tracked by git add. git add casues the file to be copied into the object store and indexed by its resulting SHA1 value. Staging a file is also called caching a file or putting a file in the index.

Using git rm


The git rm command is the inverse of git add. To remove a file from the working directory that has not been staged, the normal rm command can be used. git rm --cached removes a file from the index and leaves it in the working directory. git rm removes the file from both the index and the working directory.

The .gitignore File


A .gitignore file can contain a list of filename patterns that specify what files to ignore. A directory name is marked by a trailing slash character (/). A pattern containing shell globbing characters such as * are expanded as shell glob patterns. An asterisk can only match a single file or directory name. An exclamation point inverts the sense of the pattern on the rest of the line.

.gitignore files can also be present in any directory within the repository, and each will affect its directory as well as all subdirectories of that one. The rules also cascade and can be overrided by more local .gitignores.

A Detailed View of Git's Object Model and Files


What the commit does: first the virtual tree object that is the index is converted to a real tree object and placed in the object store under its SHA1 value. Second, a commit object is created with the log message, and this commit points to the new tree object and also the previous commit. This, the branch ref is moved from the most recent commit to the newly created commit object, becoming the new branch HEAD.
In Git, what can be thought of as the set of intended modifications?
Linus Torvalds once argued on the Git mailing list that Git cannot be appreciated without understanding the purpose of what?
What are you querying the state of when using the git status command?
What command can be used to query the state of the Git index at any time?
What are the three categories that Git classifies your files into?
Git considers files that are already in the repository or staged in the index to be what?
Git considers files that are explicitly declare invisible to be what?
Git considers files that are not in the repository or staged, and also not being explicitly ignored, to be what?
What Git command converts an untracked file to a tracked one?
What is the special file to Git that is used to declare what files are to be ignored?
What command is the inverse of git add?
What is the Git command to remove a file from the index, but leave it in the working directory?
What is the Git command to remove a file from the index and also remove it from the working directory?
What is the command to remove a file that has not been staged from the working directory?
In a .gitignore file, what marks a directory name?
What converts the virtual tree object that is in the Git index to a real tree object placed in the object store under its SHA1 value, as well as creates a commit object pointing to the tree and previous commit(s)?
Version Control with Git 6. Commits Study cards
When a commit occurs, Git records a snapshot of the {{c1::index}} and places that snapshot in the object store. There is a one-to-one correspondence between a commit and a set of changes. A {{c1::commit}} is the only method of introducing changes to a Git repository. How you decide when to commit is up to you and your preferences or development style. Git is well-suited to frequent commits.

Absolute Commit Names

The most rigorous name for a commit is the hash ID which is an absolute name that can only refer to one commit, since it is a globally unique ID.

refs and symrefs

A {{c2::ref}} is a {{c1::SHA1}} hash ID that refers to an object within the Git object store. A {{c2::ref}} may refer to any Git object, but it is usually a {{c1::commit}}. A symbolic reference, or {{c2::symref}}, is just a {{c1::ref}} but it indirectly points to a Git object.

Git maintains several special symrefs automatically. {{c1::HEAD}} is a {{c2::symref}} that always refers to the most recent commit on the current branch. HEAD updates automatically to refer to the latest commit when you change branches. {{c1::ORIG_HEAD}} is the previous version of HEAD recorded after certain operations like merge and reset. FETCH_HEAD is a shorthand for the head of the last branch fetched and is valid only immediately after a fetch operation. {{c1::MERGE_HEAD}} is the tip of the other branch being merged when a merge is in progress. All these special symrefs can be used anywhere a commit can be used.

Relative Commit Names

All commits except for the root commit derive from one or more parent commits. Given a commit with three parents, the three parent commits are C^1, C^2, and C^3. C~1, C~2, and C~3 refer to the first parent, the first grandparent, and the first great-grandparent commits.

Viewing Old Commits

The primary command to show the history of commits is git log.

Commit Graphs

In CS, a {{c1::graph}} is a collection of nodes and a set of edges between the nodes. There are several types of graphs with different properties. Git makes use of a special graph called a {{c1::directed acyclic graph}} to implement the history of commits. There are two important properties of a DAG. First, the edges within the graph are all directed from one node to another. Second, starting at any node, there is no path along the directed edges that leads back to the starting node. Git implements the history of commits as a DAG. In the commit graph, each node is a commit, and all edges are directed from descendent nodes to parent nodes.

The thing to understand is that normal commits have exactly one parent, that there is usually only one commit with zero parents, and that merge commits have more than one parent. A commit with more than one child is where the history diverged into two branches.

Using git bisect

git bisect can be used to isolate and determine that commit that introduced a problem systematically.

Using git blame

git blame tells you who last modified each line of a file and which commit introduced the change.

Using Pickaxe

The -S option to git log searches through the history of a file's diffs for the given string (git log -Sstring). The -S option is called pickaxe.
Where does Git place a snapshot of the index when a commit occurs?
What is the only method of introducing changes to a Git repo?
What operation causes Git to record a snapshot of the index and place that snapshot in the object store as a tree?
What is the only means to introduce changes into a Git repository?
What is the most rigorous name for a Git commit?
What term refers to an SHA1 hash ID that refers to an object within the Git object store?
What type of ref is a name that indirectly points to a Git object?
What are refs that indirectly point to Git objects (rather than directly) and include HEAD, ORIG_HEAD, etc.?
HEAD, ORIG_HEAD, FETCH_HEAD, are examples of these types of refs that indirectly points to Git objects, usually commits?
What is the special Git symref that updates automatically to refer to the latest commit when you change branches?
What is the special symref maintained by Git automatically that refers to the latest commit on the current branch?
If a Git commit has three parents, how can the three parent commits be referred to as (relative commit names)?
How can the parent, grandparent, and great-grandparent commits of a git commit be referred to as (relative commit names)?
In Computer Science, what is a collection of nodes and a set of edges between the nodes?
What is the special type of graph that is used by Git, where all edges are directed, and no path along the directed edges leads back to the starting node?
What is the type of graph that Git uses to implement the history of commits?
What are the two important properties of a directed acyclic graph noted in the Version Control with Git book?
What is the git command that can be used to systematically isolate and determine a commit that introduced a problem using a clever divide and conquer algorithm?
[...] is the tip of the other branch being merged when a merge is in progress.
A [...] may refer to any Git object, but it is usually a [...].
In CS, a [...] is a collection of nodes and a set of edges between the nodes.
[...] is the previous version of HEAD recorded after certain operations like merge and reset.
A symbolic reference, or [...], is just a [...] but it indirectly points to a Git object.
A [...] is the only method of introducing changes to a Git repository.
[...] is a [...] that always refers to the most recent commit on the current branch.
When a commit occurs, Git records a snapshot of the [...] and places that snapshot in the object store.
A [...] is a [...] hash ID that refers to an object within the Git object store.
Git makes use of a special graph called a [...] to implement the history of commits.
Version Control with Git 7. Branches Study cards
The fundamental means of launching a separate line of development in a software project is a branch. A branch can be used to keep an old version of the project alive. A branch can be used to encapsulate a development phase or the development of a single feature or bug fix.

Branch names


Branch names are pretty arbitrary but there are many characters not allowed or not allowed at the beginning or end.

Using Branches


There can only be one current branch within the git repository at any given time. The current branch determines which files are checked out in the working directory. It is also often an implicit operand in Git commands (such as the target of a merge).

The most recent commit on a branch is called the tip or {{c1::head}} of the branch. A branch/branch name can be thought of as a pointer to a particular {{c1::commit}}. To introduce a new branch to the repo within checking it out as the current branch, use the git branch command. This command also lists the branch names found in the repo if not passing a branch name argument. The git show-branch also can be used to view the branches and provides more detailed output than git branch.

Checking out Branches


The git checkout command changes the current branch. If you have local changes, Git will issue an error message and let you know that you need to commit you changes first (adding it to the index is not sufficient). You can use the -f option to tell Git to check out the other branch anyway which will overwrite your local changes.

Detached HEAD Branches


If you check out a random commit, Git creates a sort of anonymous branch for you called a {{c1::detached HEAD}}. It also does this when starting a git bisect operation or the git submodule update command.

Deleting Branches


The git branch command with the -d or -D option can be used to delete a branch. If deleting the branch were to delete a commit that would be lost since it had not been merged into main, Git will issue an error when using -d but the -D option will override Git's safety check. When a branch is deleted, the branch name is gone. It may be possible to recover the commits (git reflog would be the command to get that started) but commits with no references to them will eventually be collected as garbage.
What is the fundamental means of launching a separate line of development in a software project?
What determines what files are checked out in the working directory of a Git repository?
How many current branches can there by in a Git repository at any one time?
What is the most recent commit on a branch called?
What is the command to introduce a new branch to the repo?
What does the git branch command do when used with no arguments?
What git command can also be used to view the branches (like git branch does) but provides more detailed output?
What is the Git command that changes the current branch?
What is the anonymous branch Git creates for you when you check out a random commit called?
What options can be used with the git branch command to delete a branch?
If you delete a branch with the -D option and then realize you want to get a commit that is gone back, where should you turn?
A branch/branch name can be thought of as a pointer to a particular [...].
If you check out a random commit, Git creates a sort of anonymous branch for you called a [...].
The most recent commit on a branch is called the tip or [...] of the branch.
Version Control with Git 9. Merges Study cards
A merge unifies two or more commit history branches. Most often, it merges two branches, but Git supports merging more than two branches at once. When doing a merge. the currently checked out branch will always be the target branch that receives the merge commit.

If there are no conflicts between branches, Git computes a merge result and creates a new commit that reflects the unified state. If there were changes to the same line on the same file in both branches, the Git marks the contentious changes as "unmerged" in the index and leaves reconciliation to you.

A good rule to follow that makes your life easier is to always do a merge with a clean working directory and index (do not have any modified files or staged changes).

A useful alternative to graphical tools such as gitk for dumb terminals is the git log --graph command.

Although a Git merge is a symmetrical operation, it makes sense to say "I merged this branch into this one" because only one branch gets the merge commit.

A merge with a conflict


When you need to resolve a merge conflict, once you are happy with the resolution, you git add the file to the index and then commit the merge using git commit. Git will open an editor with a template message to alter the commit message, and when you exit the editor, Git indicates the successful creation of a new merge commit.

Merge strategies


The git merge-base command can be used to find the merge base between branches.

The two common degenerate scenarios that lead to merges are called already up-to-date and fast-forward. Neither of these scenarios introduce a new merge commit.

Already up-to-date mean that all the commits from the other branch are already present in the target branch.

Fast-forward happens when your branch is already fully present and represented in the other branch. Git simply tacks on the new commits to HEAD and then moves HEAD to point to the final, new commit. This case is common on tracking branches.

Resolve, recursive, and octopus are merge strategies that all produce a final commit added to your branch that represents the combined state of the merge.

Resolve joins two branches and locates the common ancestor as the merge base and applies changes from the merge base to the tip of the other branch HEAD onto the current branch.

Recursive also only joins two branches but handles the scenario when there is more than one merge base. It forms a temporary merge of all the common merge bases and uses that as the base from which to derive the resulting merge using the same algorithm as resolve.

Octopus is for merging more than two branches and internally uses the recursive strategy once for each branch being merged. It cannot handle any merge that requires any form of conflict resolution.

The two special merge strategies are ours and subtree.

Ours merges in any number of branches, but discards changes from all branches but the current branch. The result of the merge is identical to the current HEAD.

Subtree merges a branch in but everything in it is merged into a particular subtree of the current tree, which is determined automatically by Git.

How Git Thinks About Merges


In most VCSs, a commit can only have one parent. In Git, the merge yields a new tree object with the merged files, and it also introduces a new commit object, but on only the target branch. The merged tree object symmetrically represents both source branches equally. The branch onto which the merge happened gets a commit that has a parent commit from each branch in the merge.

Squash merges


With Git, since both branches being merged are considered equal, it does not make sense to squash the commits from one branch. It can with the --squash option to git merge or git pull, if you want it to.

Why Not Just Merge Each Change One by One?


Why is it not preferable to have a simple, linear history? An important observation about Git commit histories is that each revision is real. If you apply a sequence of commits on top, this creates a series of entirely new versions, and these new intermediate versions never actually existed. Having states that never really existed loses the reason for having a detailed history in the first place. If you do want Git to work like this, it can do that; the process is called rebasing.
What Git operation unifies two or more branches?
Does Git support merging more than two branches at the same time?
When a Git merge happens, what branch gets the merge commit?
What should you always make sure before you do a merge with Git?
What option can you pass to git log to see a nice visual that is an alternative to graphical tools such as gitk?
What Git command can find the merge base between branches?
What are the two degenerate merge scenarios that do not introduce a new merge commit?
What is the degenerate Git merge scenario where all the commits from the other branch are already present in the target branch?
What is the degenerate Git merge scenario where your branch is already fully present in the other branch and so the new commits are tacked on to the current branch?
What are the three normal merge strategies that produce a merge commit added to the currently checked out branch?
What is the normal merge strategy that locates the merge base and applies changes from the merge base to the top of the other branch HEAD onto the current branch?
What is the normal merge strategy that merges two branches but can work when there is more than one possible merge base?
What is the normal merge strategy that temporarily merges the possible merge bases to use as the merge base to derive the resulting merge?
What is the normal merge strategy that internally uses the recursive strategy but can merge more than two branches at once?
When can the octopus Git merge strategy not work?
What are the two special merge strategies discussed in the Version Control with Git book?
Does a merge commit represent any branch more than any other branch when they are merged together?
What is a reason why it is probably preferable to do a merge rather than create a linear history (rebase) according to the Version Control with Git book?
Version Control with Git 10. Altering Commits Study cards
There are different schools of thought when it comes to manipulating the development history. One might be that every commit is retained and nothing is altered (realistic history). One might be a fine-grained realistic history where you commit every change ASAP. Didactic realistic history might be to take your time and commit your best work only at convenient and suitable moments.

There can be a lot of value in the full, fine-grained realistic history. It can provide archaeological details that provide insight into the introduction of a bug or how the developers work and how the process can be improved. But a cleaner history with well-defined steps can be a joy to read and a pleasure to work with.

Caution About Altering History


As a general guideline, you should feel free to alter commits in a branch as long as no other developer has a copy of the branch.

Using git reset


The git reset command changes your repository and working directory to a known state. git reset adjusts the HEAD ref to a given commit and updates the index to match that commit. git reset can also modify the working directory to mirror the revision of your project represented by the given commit.

It has three main options: --soft, --mixed, and --hard.




What does this command do? git reset --soft commit
What does this command do? git reset --hard commit
Version Control with Git 11. The Stash and the Reflog Study cards

The Stash


git stash save saves the current index and working directory state as an independent commits accessible through the ref refs/stash. The git stash pop command restores the context saved by a previous save operation on top of your current working directory and index. The pop operation takes the stash content and merges those changes into the current state. These two basic commands implement a stack of stash states.

The two basic stash commands, git stash save and git stash pop, implement a stack of stash states. If a conflict resolution is needed when doing a git stash pop, Git will not automatically drop the state so you should also do a git stash drop.

git stash also provides a quick way to get around the problem where a git pull is not possible due to it overwriting your local changes. You can get around this problem by doing the git pull in between git stash save and git stash pop.

There is also git stash branch which can be used when stashed work cannot apply cleanly to the current branch.

The Reflog


Git's reflog has you covered when you are confused at what just happened or have just done an operation that you shouldn't have done. Using the reflog, you can see that operations happened as you expected on the branches that you intended, and can also recover lost commits in case something went astray.

The reflog allows you to see the operations that happened and recover lost commits. It is a record of changes to the tips of branches within nonbare repositories. Every time an update is made to any ref, including HEAD, the reflog is updated to record how that ref has changed. It can be thought of as a trail of bread crumbs showing where you and your refs have been. Any Git operation that modifies a ref or changes the tip of a branch is recorded.

git reflog show displays the transactions for only one ref at a time. The default ref is HEAD. Since branch names are also refs, the git reflog command can be passed a branch name to display its changes.

Each line of the reflog shows an individual transaction from the history of the ref.

The interesting aspect of the reflog is that each of the sequentially numbered names like HEAD@{1} can be used as symbolic names for any Git command that takes a commit.

The reflog doesn't become huge because Git runs a garbage collection process occasionally. During this, some of the older reflog entries are expired and dropped.
What is the Git command to save the current index and working directory as an independent commit accessible through the ref refs/stash?
What is the Git command that restores the context saved by a git stash save command?
What common data structure is closely related to the git stash save and git stash pop commands?
What Git concept is a record of changes that is updated any time an update is made to any ref?
How many refs does the reflog show you relevant operations for at one time?
When using the git reflog show command, what is the default ref?
What does Git have which comes in handy when you are confused at what just happened or when you just did some operation you regret?