Database/Data storage implementation

Subreddit: /r/databasedevelopment

Build your own....

B-Tree Implementation
Building BerkeleyDB
Building a NoSQL database from zero
Build a NoSQL database from scratch in 1000 lines of code (in Go)
Mini-LSM: Build a simple key-value storage engine in a week. Extend it in the second and third weeks.
LibraDB: "... a simple, persistent key/value store written in pure Go. The project aims to provide a working yet simple example of a working database."
simpledb: A simple database built from scratch that has some the basic RDBMS features (SQL query parser, transactions, query optimizer)
"How to build a relational database from scratch" (Medium members only)
C: Let's Build a Simple Database Build a clone of sqlite from scratch (Source)
C++: Build Your Own Redis from Scratch
C++: Implementation of a B-Tree Class
C#: Build Your Own Database (Source)
Clojure: An Archaeology-Inspired Database
Crystal: Why you should build your own NoSQL Database (Source, Archived)
Go: Build Your Own Database from Scratch: Persistence, Indexing, Concurrency
Go: Build Your Own Redis from Scratch
Go: gosqldb: A key-value persistent database that supports SQL queries over B+ and LSM trees
Java: Electric's B-Tree (Source JAR file)
Java: JDBM3: Work was paused and redirected to JDBM4-renamed-MapDB
Java: MapDB: MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine. (Source)
Java: The SimpleDB Data System: "... a multi-user transactional database server written in Java, which interacts with Java client programs via JDBC. The system is intended for pedagogical use only. The code is clean and compact. The APIs are straightforward. The learning curve is relatively small. Everything about it is geared towards improving the experience of a database system internals course. Consequently, the system is intentionally bare-bones. It implements only a small fraction of SQL and JDBC, and does little or no error checking. The SimpleDB code is an integral part of my textbook Database Design and Implementation, published by Springer."
JavaScript: Dagoba: an in-memory graph database
Python: DBDB: Dog Bed Database
Python: Write your own miniature Redis with Python
Ruby: Build your own fast, persistent KV store in Ruby
Rust: Build your own Redis client and server
Rust: YourSQL
Rust: OxidSQL
Rust: erdb: An educational relational database

Reading

Transactions

"Decomposing Transactional Systems": "Every transactional system does four things:
- It executes transactions.
- It orders transactions.
- It validates transactions.
- It persists transactions.
Executing a transaction means evaluating the body of the transaction to produce the intended reads and writes. There is still notable variety across systems as to how the body of a transaction is executed. Writes might be applied to storage during this phase, or they might be buffered locally and submitted as a batch at the end. A transaction might be executed more than once for different purposes.

Ordering a transaction means assigning the transaction some notion of a time at which it occurred. This could be a version, a timestamp, a log sequence number, or a more complex description of transaction IDs it happened before or after. MVCC databases may assign two versions: an initial read version, and a final commit version. In this case, we’re mainly focused on the specific point at which the commit version is chosen — the time at which the database claims all reads and writes occurred atomically.

Validating a transaction means enforcing concurrency control, or more rarely, domain-specific semantics. If a transaction is going to be rejected for a system defined reason, such as having serializability conflicts with existing transactions, it will happen here. When validation happens after ordering, it checks to see if the assigned order is valid. When validation happens before ordering, it provides a range of acceptable commit versions, and the ordering step chooses one of them.

Persisting a transaction makes making it durable, generally to disk. Sometimes writes are incrementally made durable during transaction execution, but the key point in persistence is when all writes and the commit record marking the transaction as committed are durable. Often this is noting how the system performs replication and persists the outcome of its atomic commitment protocol. (And sometimes the lines between those two aren’t very clear.)

All four of these things must be done before the system may acknowledge a transaction’s result to a client. However, these steps can be done in any order. They can be done concurrently. Different systems achieve different tradeoffs by reordering these steps. ... A classic optimistic concurrency control database will execute a transaction, and record the read and write sets. Once execution finishes, a commit version is allocated, and concurrent transactions are checked for conflicts. If no conflicts were found, the transaction is made durable, and acknowledged as committed. ... A classic pessimistic concurrency control database executes a transaction and acquires locks as it runs to exclude conflicting transactions. When the transaction finishes, it acquires a commit version, and then is persisted to disk. Then it releases the locks.

Tags: storage

Last modified 17 July 2025

Database/Data storage implementation

Resources on how to build a database

Build your own....

Reading

Transactions