Mini Git

Go
Smart HTTP
Zlib Compression

Why I Built This

Mini Git was born out of curiosity. Like most developers, I used Git daily—commit, push, clone—without really knowing what those commands did under the hood. I wanted to change that. I set out to peel back the layers, to understand how Git organizes data, stores history, and transfers objects. This project isn’t about reinventing Git; it’s about learning by building.

Each part of Mini Git was a new challenge that deepened my understanding. From initializing a repository to cloning one over HTTP, I explored Git’s internals step by step. The process was humbling, frustrating, and extremely rewarding. Along the way, I faced everything from object formats to delta compression and binary protocols.

The Motivation

Git is often a black box. We run commands and trust that it works. But what actually happens when we run git commitor git clone? I wanted to find out by reimplementing Git’s core features. This meant dealing with binary file formats, zlib compression, and the Smart HTTP protocol.

My goal wasn’t to create a production-ready tool, but to get a hands-on understanding of Git’s internals—its object model, file formats, and protocols. This was about learning by doing, and challenging myself to go beyond surface-level usage.

How It Came Together

The project is structured around core Git features. Each challenge builds on the last:

Challenge 01: Initializing a Repository

I started with git init, setting up .git/ with its core directories and files—objects/, refs/, HEAD, index. It gave me a basic understanding of Git’s internal structure.

Challenge 02: Reading Blob Objects

Next, I tackled git cat-file. I parsed zlib-compressed blobs stored under .git/objects/, discovering how Git uses SHA-1 hashes to identify content.

Challenge 03: Creating Blob Objects

With git hash-object, I created my own blobs by hashing content and storing them in the correct path. This taught me about Git’s content-addressable storage model.

Challenge 04: Reading Tree Objects

Tree objects model directories. I learned to parse their binary structure: <mode> <name>\0<sha>. This showed me how Git stores directory snapshots efficiently.

Challenge 05: Writing Tree Objects

Using git write-tree, I built trees from the index. This required bottom-up tree construction, grouping by directories and avoiding redundant trees - I used DFS algorithm for this part.

Challenge 06: Creating Commits

A commit object links a tree snapshot to metadata and parent commits. Building one by hand taught me how Git tracks history with immutable snapshots.

Challenge 07: Cloning a Repository

The toughest challenge was git clone. I reverse-engineered Git’s Smart HTTP protocol, parsed server responses, and processed .pack files. The pack file format was the toughest part—Git crams so much data into such a tight structure. I had to deal with variable-length integers, Big-Endian encoding, and delta compression (like REF_DELTA), all while parsing a binary stream byte by byte. Decompressing objects and reconstructing the working directory felt like assembling a puzzle with no picture to guide me. But when it worked, it was a rush— I’d cloned a repository from scratch!

What I Learned

Mini Git gave me a deep appreciation for Git’s internals. I now understand:

How Git stores everything using content-addressed, immutable objects
How trees and commits form a history graph
How .pack files and delta compression optimize storage
How Smart HTTP enables efficient network transfer

Git isn’t magic—it’s a clever combination of simple parts. By rebuilding it from scratch, I learned how those parts fit together.

Why It Matters

If you've ever wondered what happens when you type git clone, this is a deep dive into that process. It’s not about replacing Git but about appreciating the engineering behind it. I hope Mini Git inspires other devs to dig deeper and explore the tools they use every day.