Mini Git was born out of curiosity. Like most developers, I used Git daily—commit, push, clone—without really knowing what those commands did under the hood. I wanted to change that. I set out to peel back the layers, to understand how Git organizes data, stores history, and transfers objects. This project isn’t about reinventing Git; it’s about learning by building.
Each part of Mini Git was a new challenge that deepened my understanding. From initializing a repository to cloning one over HTTP, I explored Git’s internals step by step. The process was humbling, frustrating, and extremely rewarding. Along the way, I faced everything from object formats to delta compression and binary protocols.
Git is often a black box. We run commands and trust that it works. But what actually happens when we run git commit
or git clone
? I wanted to find out by reimplementing Git’s core features. This meant dealing with binary file formats, zlib compression, and the Smart HTTP protocol.
My goal wasn’t to create a production-ready tool, but to get a hands-on understanding of Git’s internals—its object model, file formats, and protocols. This was about learning by doing, and challenging myself to go beyond surface-level usage.
The project is structured around core Git features. Each challenge builds on the last:
I started with git init
, setting up .git/
with its core directories and files—objects/, refs/, HEAD, index. It gave me a basic understanding of Git’s internal structure.
Next, I tackled git cat-file
. I parsed zlib-compressed blobs stored under .git/objects/
, discovering how Git uses SHA-1 hashes to identify content.
With git hash-object
, I created my own blobs by hashing content and storing them in the correct path. This taught me about Git’s content-addressable storage model.
Tree objects model directories. I learned to parse their binary structure: <mode> <name>\0<sha>
. This showed me how Git stores directory snapshots efficiently.
Using git write-tree
, I built trees from the index. This required bottom-up tree construction, grouping by directories and avoiding redundant trees - I used DFS algorithm for this part.
A commit object links a tree snapshot to metadata and parent commits. Building one by hand taught me how Git tracks history with immutable snapshots.
The toughest challenge was git clone
. I reverse-engineered Git’s Smart HTTP protocol, parsed server responses, and processed .pack
files. The pack file format was the toughest part—Git crams so much data into such a tight structure. I had to deal with variable-length integers, Big-Endian encoding, and delta compression (like REF_DELTA), all while parsing a binary stream byte by byte. Decompressing objects and reconstructing the working directory felt like assembling a puzzle with no picture to guide me. But when it worked, it was a rush— I’d cloned a repository from scratch!
Mini Git gave me a deep appreciation for Git’s internals. I now understand:
.pack
files and delta compression optimize storageGit isn’t magic—it’s a clever combination of simple parts. By rebuilding it from scratch, I learned how those parts fit together.
If you've ever wondered what happens when you type git clone
, this is a deep dive into that process. It’s not about replacing Git but about appreciating the engineering behind it. I hope Mini Git inspires other devs to dig deeper and explore the tools they use every day.