Git maintains two primary data structures
To efficiently use disk space and network bandwidth, git compresses the objects and stores in pack-files which are also placed in .git/objects directory
Simplistically just remember,
SHA1 values are 160-bits, 20-bytes. Represented in 40 Hex Characters.
Git uses SHA1 hash of the content as file name. SHA1 hash is 40 characters, so first 2 characters as folder name and remaining 38 characters as filename in .git/object/ directory.
8ab686eafeb1f44702738c8b0f24f2567c36da6d
is the hash of the content. Then it will be stored as
.git/objects/8a/b686eafeb1f44702738c8b0f24f2567c36da6d
. It is considered as globally unique because you can have 2160 or 148 possible SH1 hashes(i.e., 1 with 48 zeros after it)
Important characteristic of SHA1 hash computation is it always computes the same hash for identical content, regardless of where the content is. In other words, the same file content in different directories and even on different machines yields the exact same SHA1 hash ID. Thus, the SHA1 hash ID of a file is a globally unique identifier.
Any change to the file makes SHA1 hash change and thus creating new version of the file.
A collision is very rare but possible( if one hashed 280 random blobs )
SHA1 hash can point to a blob, a commit or a tree.
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)2$ echo "Hello, World!" | git hash-object --stdin38ab686eafeb1f44702738c8b0f24f2567c36da6d
Git is simple key-value data store meaning any content you add into git repo, you will in-turn get a unique key for it. Later that object/content can be retreived by using the key.
Git consists of 4 types of object :
Overall this will be the structure of git internal objects
For testing purposes, we can either create a new file or create a object like below,
1$ echo 'test content' | git hash-object -w --stdin2d670460b4b4aece5915caf5c68d12f560a9fe3e4
git hash-object
would take the content you handed to it and merely return the unique key that would be used to store it in your Git database. -w
option then tells the command to not simply return the key, but to write that object to the database. --stdin
option tells git hash-object to get the content to be processed from stdin; otherwise, the command would expect a filename argument at the end of the command containing the content to be used.1$ git hash-object -w test.txt283baae61804e65cc73a7201a7252750c76066a30
All the git objects can be found in .git/objects folder. Below, we are in a new repo, so its empty.
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)2$ find .git/objects3.git/objects4.git/objects/info5.git/objects/pack
Create a new file with content & commit
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)2$ echo "Hello, World!" > HW.txt3
4Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)5$ git add .6warning: LF will be replaced by CRLF in HW.txt.7The file will have its original line endings in your working directory.8
9Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)10$ git commit -m "Initial Commit"11[master (root-commit) 003c678] Initial Commit12 1 file changed, 1 insertion(+)13 create mode 100644 HW.txt14
15Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)16$ git log17commit 003c6781e4475888b59f248e5e76d3334d278f99 (HEAD -> master)18Author: Sushanth Bobby Lloyds <bobby.dreamer@gmail.com>19Date: Mon Oct 12 22:34:54 2020 +053020
21 Initial Commit
Note : If you look at the output of commit above, the code next create mode
means,
Now objects directory has 3 files – 3 Objects. They are Commit, Tree & Blob
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)2$ find .git/objects3.git/objects4.git/objects/005.git/objects/00/3c6781e4475888b59f248e5e76d3334d278f996.git/objects/8a7.git/objects/8a/b686eafeb1f44702738c8b0f24f2567c36da6d8.git/objects/ee9.git/objects/ee/929cd9cd862b204986cf94ab23853b4c98cb9710.git/objects/info11.git/objects/pack
Now lets map what is what
From git log
we know commit hash is 003c6781e4475888b59f248e5e76d3334d278f99
1.git/objects/002.git/objects/00/3c6781e4475888b59f248e5e76d3334d278f99
Using the command git ls-files -s
, we can know the hash of the files
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)2$ git ls-files -s3100644 8ab686eafeb1f44702738c8b0f24f2567c36da6d 0 HW.txt
Now we can say that below is the file
1.git/objects/8a2.git/objects/8a/b686eafeb1f44702738c8b0f24f2567c36da6d
Now we can easily make a guess that remaining one has to the tree
1.git/objects/ee2.git/objects/ee/929cd9cd862b204986cf94ab23853b4c98cb97
Instead of guessing, we can use git cat-file
to know the type, content and size of the files from the hash.
git cat-file –t <hash>
git cat-file –p <hash>
git cat-file –s <hash>
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)2$ git cat-file -t ee9293tree4
5Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)6$ git cat-file -t 8ab687blob8
9Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)10$ git cat-file -t 003c611commit
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)2$ git cat-file -p ee9293100644 blob 8ab686eafeb1f44702738c8b0f24f2567c36da6d HW.txt4
5Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)6$ git cat-file -p 8ab687Hello, World!8
9Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)10$ git cat-file -p 003c611tree ee929cd9cd862b204986cf94ab23853b4c98cb9712author Sushanth Bobby Lloyds <bobby.dreamer@gmail.com> 1602522294 +053013committer Sushanth Bobby Lloyds <bobby.dreamer@gmail.com> 1602522294 +053014
15Initial Commit
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)2$ git cat-file -s ee9293344
5Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)6$ git cat-file -s 8ab687148
9Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)10$ git cat-file -s 003c611209
Note : Cannot use CAT command to print the contents as they are compressed
Lets see basic difference about light-weight tag & annotated tag. In the below example,
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)2$ git cat-file -t v1.03commit4
5Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)6$ git cat-file -t v2.07tag
Lets pretty print v1.0 & v2.0. Here you can see whats in both the tags.
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)2$ git lol3* 83ce55e - (HEAD -> master) Adding g.txt (1 year, 10 months ago) <Sushanth Bobby Lloyds>4* 804e1db - Added f.txt - Tag Testing (1 year, 10 months ago) <Sushanth Bobby Lloyds>5* 25dc023 - (tag: v2.0) Revert "f7.txt Update 1" (1 year, 11 months ago) <Sushanth Bobby Lloyds>6* d9e798a - f7.txt Update 2 (1 year, 11 months ago) <Sushanth Bobby Lloyds>7* 2f161e1 - f7.txt Update 1 (1 year, 11 months ago) <Sushanth Bobby Lloyds>8* 431be32 - f7.txt Initial (1 year, 11 months ago) <Sushanth Bobby Lloyds>9* 4d60b51 - Adding e.txt (1 year, 11 months ago) <Sushanth Bobby Lloyds>10* 34013d4 - Adding d.txt (1 year, 11 months ago) <Sushanth Bobby Lloyds>11* f891fb4 - Adding c.txt (1 year, 11 months ago) <Sushanth Bobby Lloyds>12* 3cee413 - Adding b.txt (1 year, 11 months ago) <Sushanth Bobby Lloyds>13* 080f76f - Adding a.txt (1 year, 11 months ago) <Sushanth Bobby Lloyds>14* 4fd2b57 - Revert "Adding f5.txt" (1 year, 11 months ago) <Sushanth Bobby Lloyds>15* 0500b45 - (tag: v1.0) Adding f6.txt (1 year, 11 months ago) <Sushanth Bobby Lloyds>16...17
18Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)19$ git cat-file -p v1.020tree 10aa603d8807f825e542e351421d82784119b54221parent 5e01aa2fd80af3f7ac30013f41df6fee105f9c9022author Sushanth Bobby Lloyds <bobby.dreamer@gmail.com> 1542990955 +053023committer Sushanth Bobby Lloyds <bobby.dreamer@gmail.com> 1542990955 +053024
25Adding f6.txt26
27Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)28$ git cat-file -p v2.029object 25dc023c91c8a2ae63b2f7d92f93b094347e9bec30type commit31tag v2.032tagger Sushanth Bobby Lloyds <bobby.dreamer@gmail.com> 1543898924 +053033
34O.O Version 2
To confirm that v1.0 is just refering the commit. We are below pretty printing hash 0500b45
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)2$ git cat-file -p 0500b453tree 10aa603d8807f825e542e351421d82784119b5424parent 5e01aa2fd80af3f7ac30013f41df6fee105f9c905author Sushanth Bobby Lloyds <bobby.dreamer@gmail.com> 1542990955 +05306committer Sushanth Bobby Lloyds <bobby.dreamer@gmail.com> 1542990955 +05307
8Adding f6.txt
The output is exactly same. So for ease of use, instead of using commit hash you can use light-weight commit for referencing.
Lets see this git graph
.
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test3 (master)2$ git lol3* 335604e - (HEAD -> master) Merge branches 'add' and 'sub' (8 days ago) <Sushanth Bobby Lloyds>4|\5| * f608a17 - (sub) Update sub() (8 days ago) <Sushanth Bobby Lloyds>6* | b1988f4 - (add) Updated add() (8 days ago) <Sushanth Bobby Lloyds>7|/8* c35ab3e - Added both add and sub (8 days ago) <Sushanth Bobby Lloyds>9|\10| * 447bd6a - Added sub feature (8 days ago) <Sushanth Bobby Lloyds>11* | 8671031 - Added add feature (8 days ago) <Sushanth Bobby Lloyds>12|/13* 216acda - Initial commit (8 days ago) <Sushanth Bobby Lloyds>
Lets pretty print commit 335604e
or HEAD
and confirm the parent
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test3 (master)2$ git cat-file -p HEAD3tree ec0367d3524177d7b5350200217a227677b5b9e04parent b1988f45e9ef5f1717762df1d39ea409eb63cb4d5parent f608a17f5bbccf7bc5b5154ec0c8299e039333646author Sushanth Bobby Lloyds <bobby.dreamer@gmail.com> 1601826795 +05307committer Sushanth Bobby Lloyds <bobby.dreamer@gmail.com> 1601826795 +05308
9Merge branches 'add' and 'sub'
git merge-base
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test3 (master)2$ git merge-base add sub3c35ab3e9d7611576ac203473d5c75946b336810b
Lets take below example for our rev-parse test.
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)2$ git lol3* 83ce55e - (HEAD -> master) Adding g.txt (1 year, 10 months ago) <Sushanth Bobby Lloyds>4* 804e1db - Added f.txt - Tag Testing (1 year, 10 months ago) <Sushanth Bobby Lloyds>5* 25dc023 - (tag: v2.0) Revert "f7.txt Update 1" (1 year, 11 months ago) <Sushanth Bobby Lloyds>6...7* 0500b45 - (tag: v1.0) Adding f6.txt (1 year, 11 months ago) <Sushanth Bobby Lloyds>8...
'git rev-parse` takes short-hash convert to long-hash
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)2$ git rev-parse 83ce383ce55e716284a03e8bcf20d732f3df90799d77c
Here we are rev-parsing tags, when we
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)2$ git rev-parse v1.030500b45503db6409bc2dc2d2c27a8d09a86150f84
5Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)6$ git rev-parse v2.078a128853cc7f76c0243331b49aed36f8100cbabf8
9Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)10$ git cat-file -t 8a128811tag12
13Sushanth@Sushanth-VAIO MINGW64 /d/GITs/test1 (master)14$ git cat-file -p 8a128815object 25dc023c91c8a2ae63b2f7d92f93b094347e9bec16type commit17tag v2.018tagger Sushanth Bobby Lloyds <bobby.dreamer@gmail.com> 1543898924 +053019
20O.O Version 2
Knowing rev-parse we can get the hash of commit or tree easily git rev-parse commit-ish^{type}
git rev-parse head^{tree}
git rev-parse head^{commit}
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)2$ git lol3* 003c678 - (HEAD -> master) Initial Commit (76 minutes ago) <Sushanth Bobby Lloyds>4
5Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)6$ git rev-parse master^{commit}7003c6781e4475888b59f248e5e76d3334d278f998
9Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)10$ git rev-parse master^{tree}11ee929cd9cd862b204986cf94ab23853b4c98cb9712
13Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)14$ git cat-file -p ee9215100644 blob 8ab686eafeb1f44702738c8b0f24f2567c36da6d HW.txt16
17Sushanth@Sushanth-VAIO MINGW64 /d/GITs/Internals (master)18$ git ls-files -s19100644 8ab686eafeb1f44702738c8b0f24f2567c36da6d 0 HW.txt
git fsck
Typical output looks like this
1Sushanth@Sushanth-VAIO MINGW64 /d/GITs/git (master)2$ git fsck3Checking object directories: 100% (256/256), done.4dangling commit 1972cb1c728ed3c120ed4ea41b1ff421d9eb76045dangling blob 2ccc9d4b364a7f69544839b78e223c482508919f6dangling tree 4b825dc642cb6eb9a060e54bf8d69288fbee49047dangling commit 5060f314ceef6ed57af5a9c0bc96eee39d925e048dangling commit 69643d82b2951bbe46e4c613446fe5424319cb5a9dangling blob 9686686012038d0e769708df79668e4a83afdccb10dangling blob 9fec48973ca630bb0869b04f42957f23f40e3d2e11dangling commit a34ce3582f54e198d8b050871ef5bab970a0c9c412dangling blob d62cb1ccedd70cd9b1c1ce0a19962389f0d2b4a513dangling tag 2725b352a7d635e37761ddb0e6070bb9ce5f40c014dangling tag 4129fcbafa2cc682fc9fecef0304bedce94f7bbe15dangling commit 8e1f25a9ada69c9ae4b5d56c16722e5ffe2d8fb716dangling tag 9a19fca3bb272e12b138ab3b43bbf45d5516eaa817dangling commit 9a21716ed23c6f9049e90dfa1d86838fa3a22d4418dangling tag b3abe249b2b1e7d0cf65d77c276a3c77556db16219dangling commit f0871d07baf443ce8915d28e3cbdf1d658fec211
Dangling blob : A change that made it to the staging area/index but never got committed. One thing that is amazing with git is that once it gets added to the staging area, you can always get it back because these blobs behave like commits in that they have a hash too.
Dangling commit/tag : A commit which is not associated with reference, i.e there is no way to reach it. For example, we delete the branch featureX without merging its changes, then the commit in featureX will become a dangling commit because there is no reference associated with it. Had it been merged into master, then HEAD and master references would have pointed to the commit in featureX and it would not be dangling anymore, even if we deleted featureX.
You can think branches(master/main, featureX) and HEAD are just references to specific commits. featureX and master labels refer to latest commits on their respective branches. HEAD generally refers to the tip of the currently checked out branch (master in this case).
git gc
Below command can be used to remove all dangling objects from the repository
1git gc --prune=now
Commands | Description |
---|---|
git clean -n | to list what files would be removed(dry run) |
git clean -f | to remove untracked files |
git clean -dfx | (d):remove any untracked folders, (f):force, (x):remove ignored/hidden files as well |
Caution : git clean -dfx
usually everyone ignore key files and folders. This command can delete them and it will be unrecoverable.
1rm –rf .git .gitignore