Storage is typically data, but the degree of structure is flexible. Unstructured storage is essentially a filesystem (although even then filesystems have structure to them). Structured storage is often called a "database system" in some particular shape.
Some various data sources
Link sites
Conceptual thoughts
Feels like a database implementation is made up of primarily three components:
- a model/"shape" for its data (relational, document, etc)
- a "storage" (storage engine) where the raw data lives (disk, in-memory, distributed disk)
- a topology (network, embedded, hosting, distributed/clustered)
Models/"Shapes" to data
Storage
- In-memory: disappears whenever the database process(es) go away
- Disk: data is written to the underlying disk
- Distributed disk: data is written to one of a number of disks, perhaps redundantly (for resiliency)
- Cloud: data is written to a cloud host (?)
Topology options
-
Network access: One makes network calls to access the storage engine. Most storage engines follow this model, whether inside of the same network (a la "on-prem") or cloud.
-
Embedded: The storage engine is accessed in-process inside of the using program. Often cannot be accessed by other running programs. Often managing files directly, and the storage engine shuts down when the host process does. Excellent for standalone, self-contained installations that have no external dependencies beyond the fileystem. Fastest of all the relationships, with possible exception of code hosted inside the database (a la stored procedures).
-
Code hosting: Some storage engines also allow for code-hosting, in which code executes inside the same process(es) as the storage engine itself, a la "stored procedures". The difference between this and embedding is simply which starts up first: the hosting program or the database.
Automation
- "Database Gyms": "In the past decade, academia and industry have embraced machine learning (ML) for database management system (DBMS) automation. These efforts have focused on designing ML models that predict DBMS behavior to support picking actions (e.g., building indexes) that improve the system’s performance. Recent developments in ML have created automated methods for finding good models. Such advances shift the bottleneck from DBMS model design to obtaining the training data necessary for building these models. But generating good training data is challenging and requires encoding subject matter expertise into DBMS instrumentation.
"Existing methods for training data collection are bespoke to individual DBMS components and do not account for (1) how workload trends affect the system and (2) the subtle interactions between internal system components. Consequently, the models created from this data do not support holistic tuning across subsystems and require frequent retraining to boost their accuracy.
"This paper presents the architecture of a database gym, an integrated environment that provides a unified API of pluggable components for obtaining high-quality training data. The goal of a database gym is to simplify ML model training and evaluation to accelerate autonomous DBMS research. But unlike gyms in other domains that rely on custom simulators, a database gym uses the DBMS itself to create simulation environments for ML training. Thus, we discuss and prescribe methods for overcoming challenges in DBMS simulation, which include demanding requirements for performance, simulation fidelity, and DBMS-generated hints for guiding training processes."
Datamining
Information Retrieval
Storage and retrieval
DBaaS: Database-as-a-Service
Articles
- "15 Databases, 15 Use Cases--Stop Using the Wrong Database for the Right Problem":
- Relational
- Wide Column (Cassandra)
- Time-Series (InfluxDB, Prometheus, Kdb+, etc)
- Ledger (Amazon Quantum)
- Graph (Neo4j, ArangoDB, Amazon Neptune, etc)
- OODBMS (ObjectDB, db4o, etc) (Sadly these are more or less extinct at this point)
- Hierarchical (IMS, Windows Registry, Filesystems, etc)
- Document (MongoDB, ArangoDB, CouchDB)
- Key-Value (Couchbase, DataStax, Redis)
- Blob (Amazon S3)
- In-Memory (Redis, Memcached, Apache Ignite, Aerospike, Hazlecast)
- Text Search (Elastic Search)
- Spatial (PostGIS, Oracle Spatial, SpatiaLite)
- Vector (Pinecone, Chroma)
- Embedded (SQLite, RocksDB, BerkeleyDB)
Books
Detail Pages:
- 4store RDF database
- Actian Actian NoSQL (formerly Versant) object technology enables software developers to handle database requirements for extremely complex object models with ease and is used by the world’s largest companies for applications with very large scale data management requirements. Actian NoSQL doesn’t need mapping code to store or retrieve objects, so schema modifications can be handled without application downtime. Fault tolerance, synchronous and asynchronous replication, high availability and excellent scalability make Actian NoSQL ready for the enterprise.
- Adama RDF database
- Akavache An asynchronous, persistent key-value store created for writing desktop and mobile applications, based on SQLite3.
- Amazon Dynamo A fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. DynamoDB lets you offload the administrative burdens of operating and scaling a distributed database so that you don't have to worry about hardware provisioning, setup and configuration, replication, software patching, or cluster scaling.
- Amazon Ion A richly-typed, self-describing, hierarchical data serialization format offering interchangeable binary and text representations.
- Amazon S3 (and Glacier) Storage for the internet.
- Apache.org The Apache site is a collection of numerous open-source projects, in all stages of life (incubating, maintained, archived).
- Apache Derby Full SQL-compliant RDBMS written in Java.
- Apache Drill Extensible distributed query engine that supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files.
- Apache Jena A free and open source Java framework for building Semantic Web and Linked Data applications.
- Apache TinkerPop A graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP).
- Appwrite An open-source platform that provides web and mobile developers with a set of easy-to-use and integrate REST APIs to manage their core backend needs.
- ArangoDB Multi-model database
- ArcadeDB a conceptual fork of OrientDB, a multi-model database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis; also supports Vector Embeddings.
- Arrow The universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics.
- Avro A data serialization system.
- Awesome (lists) (Project Awesome) A curated collection of lists.
- BBolt An embedded key/value database for Go.
- BerkeleyDB Embedded file-based storage system. Sort of a SQLite before SQLite was popular.
- Bolt An embedded key/value database for Go.
- Bolt protocol A highly efficient, lightweight client-server protocol designed for database applications.
- Bond An open-source, cross-platform framework for working with schematized data, supporting cross-language serialization/deserialization and powerful generic mechanisms for efficiently manipulating data.
- BrightstarDB A native RDF database for the .NET platform.
- Build Your Own redis Learn network programming and data structures by coding from scratch.
- Build-Your-Own-X A collection of links on how to build various things as a learning exercise.
- Byzer A low-code open-source programming language for data pipeline, analytics and AI.
- Cassandra An open source NoSQL distributed database trusted by thousands of companies for scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.
- Castled Data synchronization framework focused on external apps.
- Cayley An open-source database for Linked Data inspired by the graph database behind Google's Knowledge Graph (formerly Freebase).
- Citus All the greatness of Postgres, plus the superpower of distributed tables. 100% open source.
- Clipper (aka xBase) SQL-inspired language from dBase days.
- Cloudboost Cloud platform for your app; similar to Parse + Firebase + Algolia + Iron.io all combined into one.
- Cloudflare Durable Objects (D1) Serverless SQL databases to query from your Workers and Pages projects.
- CockroachDB A prime example of NewSQL with ACID guarantee of SQL and the scale of NoSQL. CockroachDB is open source and freely available for cloud-native and hybrid deployments. SQL. Geo-location of data. Horizonatal Scale. ACID transactions.
- Coherence A scalable, fault-tolerant, cloud-ready, distributed platform for building grid-based applications and reliably storing data.
- CosmosDB Microsoft's proprietary globally-distributed, multi-model database service "for managing data at planet-scale" launched in May 2017. It is schema-agnostic, horizontally scalable and generally classified as a NoSQL database.
- Couchbase A full-featured, multi-service, multimodel database.
- CouchDB Seamless multi-master sync documented-oriented native HTTP/JSON API with emphasis on reliability and concurrency.
- CQL (Cassandra Query Language) Query language that's almost SQL that rides on top of the Cassandra storage system.
- CrateDB Open-source, distributed SQL database.
- Cypher A graph database query language first popularized by Neo4j.
- D (Tutorial D) D is a set of prescriptions for what Christopher J. Date and Hugh Darwen believe a relational database management system ought to be like.
- Dapper A simple object mapper for .NET.
- Database/Data storage implementation Resources on how to build a database
- Data concurrency and parallel programming Links about different ways to store data at high levels of concurrency
- Data Format Description Language (DFDL) A language for describing text and binary data formats.
- Datasets A collection of data sets to use for teaching purposes.
- db4o An object-oriented database using "native queries" (queries in the source language).
- DBeaver Free universal database tool and SQL client.
- DBOS A Database-Oriented Operating System.
- DBOS A Typescript framework built on the database that helps you develop transactional backend applications.
- Designing Data-Intensive Applications How to think about building data-centric applications.
- DGraph Native GraphQL Database with graph backend.
- DiceDB An open source, redis-compliant, reactive, scalable, highly available, unified cache optimized for modern hardware.
- Directus Headless CMS & API for Custom Databases.
- DocumentDB An open-source document database platform and the engine powering the vCore-based Azure Cosmos DB for MongoDB, built on PostgreSQL.
- Document storage model Thoughts/articles on the document model of storage.
- Dolt It's Git for Data.
- dotnetRDF An open-source CLR library for RDF.
- DragonflyDB An in-memory datastore.
- DuckDB An in-process SQL OLAP database management system.
- DyBase A very simple object oriented embedded database for languages with dynamic type checking.
- dynomite A generic dynamo implementation for different k-v storage engines.
- EdgeDB Open source, graph-relational database.
- Entity Framework A framework for retrieving and storing data to relational systems.
- esProc SPL A scripting language for data processing, executed in a Java progrma through JDBC.
- Event-sourcing storage model Thoughts and links.
- EventStoreDB Persist your application data as streams of events with an open-source database, the best data storage solution for event-sourced systems.
- eXistdb A high-performance open source native NoXQL/XML database and application platform built entirely around XML technologies.
- Exograph A powerful data modeling language and high-performance query engine that offers dynamically generated APIs.
- Extensible-Storage-Engine (ESE, aka "Jet") An embedded/ISAM-based database engine, that provides rudimentary table and indexed access.
- eXtremeDB Hybrid Persistent & In-memory Highly Scalable Distributed Client / Server
- FaunaDB A global serverless database that gives you ubiquitous, low latency access to app data, without sacrificing data correctness and scale.
- Filestash A file manager that let you manage your data anywhere it is located.
- Firebase A (pseudo)real-time NoSQL that also has platform capabilities.
- Firebird Open-source relational database originally developed by Borland.
- FlatBuffers An efficient OSS cross platform serialization library for a lot of mainstream programming languages. It was originally created at Google for game development and other performance-critical applications.
- Fluid Framework A collection of client libraries for distributing and synchronizing shared state, allowing multiple clients to simultaneously create and operate on shared data structures using coding patterns similar to those used to work with local data.
- Flyway Relational(ish) database evolution and migration tool.
- Free Programming Books A collection of free learning resources (books).
- GDL (GNU Data Language) A domain-specific data analysis and visualization programming language and a data analysis environment.
- Geode Apache Geode is a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures. (Formerly GemStone GemFire.)
- GraphDB(Lite) Fully Featured RDF Database for Massive Data and Moderate Query Loads.
- GraphEngine A distributed in-memory data processing engine, underpinned by a strongly-typed RAM store and a general distributed computation engine.
- GraphQL A query language APIs can support to enable SQL-like behavior across APIs; supports both query and mutation, and can define complex entity descriptions.
- GraphQL Mesh Allows use of GraphQL query language to access data in remote APIs that don't run GraphQL (and also ones that do run GraphQL); can be used as a gateway to other services, or run as a local GraphQL schema that aggregates data from remote APIs.
- Graph storage model Thoughts/articles on the graph model of storage.
- Grouparoo Data synchronization framework.
- Harbour Portable, xBase compatible programming language and environment.
- HBase Datastore for Apache "big data" projects.
- Hierarchical storage model Thoughts and links.
- HyperGraphDB A general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes.
- HyperSQL Relational database written in Java.
- iBoxDB Fast ACID Table-Style Document NoSQL Database.
- IndexedDB A low-level API for client-side storage of significant amounts of structured data, including files/blobs. This API uses indexes to enable high-performance searches of this data.
- InfiniteGraph Advanced Graph Database and Analytics for Enterprises and Government.
- InfluxDB Platform for building time series applications.
- InnoDB A storage engine for the database management system MySQL and MariaDB.
- JanusGraph A scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.
- JavaScript Object Notation (JSON) Object literal syntax from ECMAScript that's since taken off as a data storage and wire transfer format.
- JSDB (JavaScript for Databases) JSDB is JavaScript for databases, a scripting language for data-driven, network-centric programming on Windows, Mac, Linux, and SunOS; it works with databases, XML, the web, and email, as a JavaScript shell, to run CGI programs, or as a web server.
- JsonDB A Opensource, Java-based, database.
- JunoDB PayPal's home-grown secure, consistent and highly available key-value store providing low, single digit millisecond, latency at any scale.
- JVM (Platform) Data-related Data storage and access on the JVM.
- KDB+ A timeseries columnar database and language (Q).
- Keyv A consistent interface for key-value storage across multiple backends via storage adapters. It supports TTL based expiry, making it suitable as a cache or a persistent key-value store.
- Key-value storage model Thoughts and links.
- Kinto A generic JSON document store with sharing and synchronisation capabilities.
- Knowledge Graph Language (KGL) A query language for exploring knowledge graphs.
- Kusto (query language) Query language for use with Azure Data Explorer.
- kvass A personal key-value data store.
- LevelDB A very fast and lightweight, embedded database; part of the Chrome browser (exposed as IndexedDB).
- libSQL A next-gen fork of SQLite.
- Limbo A SQLite rewrite in Rust.
- linq2db LINQ to database provider.
- LiteDB An open source MongoDB-like database with zero configuration - mobile ready.
- Logica A logic programming language that compiles to StandardSQL and runs on Google BigQuery.
- LokiJS Super fast in-memory javascript document oriented database.
- LowDB Simple and fast JSON database.
- M (MUMPS: Massachusetts General Hospital Utility Multi-Programming System) A procedural language with a built-in NoSQL database; or, it’s a database with an integrated language optimized for accessing and manipulating that database.
- MapDB Provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
- MariaDB Open-source relational database.
- Marten Transactional Document DB and Event Store on PostgreSQL.
- MassiveJS A data mapper for Node.js that goes all in on PostgreSQL and embraces the power and flexibility of SQL and the relational model.
- Memgraph A streaming graph application platform that helps you wrangle your streaming data, build sophisticated models that you can query in real-time.
- MessagePack An efficient binary serialization format.
- Metadata-Connect A massively scalable metadata (data about the data) management platform hosted in the cloud.
- Microsoft SQL Server Microsoft's relational database implementation.
- Minio S3 compatible object storage.
- MongoDB Document-oriented network storage.
- Multi-Dimensional and Hierarchical Database Toolkit A Linux-based, open sourced, C/C++ toolkit of portable software that supports fast, flexible, multi-dimensional and hierarchical storage, retrieval and manipulation of information in data bases ranging in size up to 256 terabytes.
- MySQL An open-source relational database.
- NDatabase OODBMS native and transparent persistence for .NET.
- Nebula Graph The graph database built for super large-scale graphs with milliseconds of latency.
- NeDB The JavaScript Database, for Node.js, nw.js, electron and the browser.
- Neo4J Industry standard for graph-oriented database.
- Nextcloud A personal cloud which runs on your own server.
- Nitrite An open source nosql embedded document store written in Java with MongoDB like API that supports both in-memory and single file based persistent store.
- NMemory A lightweight non-persistent in-memory relational database engine that is purely written in C# and can be hosted by .NET applications.
- Nocobase An open source and free no-code development platform.
- NocoDB The Open Source Airtable alternative.
- Noms The versioned, forkable, syncable database.
- NosDB A 100% native .NET Open Source NoSQL Database, extremely fast and linearly scalable and allows your .NET applications to handle extreme transaction loads (XTP).
- NuoDB Distributed SQL database.
- ObjectBox Highspeed database securely stores your data privately on-device and syncs it seamlessly with millions of devices on-premise and optionally with any cloud.
- ObjectDB A powerful Object-Oriented Database Management System (ODBMS) whose native API is the Java Persistence API (JPA).
- Objectivity/DB A scalable, high performance, distributed Object Database (ODBMS).
- Object storage model Thoughts and links.
- Optimized Row Columnar (ORC) A highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats.
- OrientDB Multidimensional open-source database.
- OrioleDB A new storage engine for PostgreSQL, bringing a modern approach to database capacity, capabilities and performance to the world's most-loved database platform.
- Orly An open-source graph database
- Outerbase An AI-powered database platform.
- Owncloud A personal cloud which runs on your own server.
- Papers We Love A collection of academic papers gathered by a popular "user group" dedicated to reading through them.
- Parquet An open source, column-oriented data file format designed for efficient data storage and retrieval.
- Perst Open source, dual license, object-oriented embedded database system (ODBMS).
- PingCAP NewSQL database that supports HTAP workloads.
- PlanetScale A serverless MySQL platform based on the Vitess horizontal scaling MySQL technology.
- Planet Scale A serverless MySQL platform based on the Vitess horizontal scaling MySQL technology.
- PostgresQL Open-source relational database.
- PouchDB Open-source JavaScript database inspired by CouchDB.
- Presto (PrestoDB, PrestoSQL) Open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes.
- Prisma An auto-generated and fully type-safe database client providing a simplistic yet extremely powerful API.
- Probase The goal of Probase is to make machines “aware” of the mental world of human beings, so that machines can better understand human communication. We do this by giving certain general knowledge or certain common sense to machines.
- PRQL (Pipelined Relational Query Language) A modern language for transforming data — a simple, powerful, pipelined SQL replacement.
- Puter Cloud-hosted OS-like platform.
- QuestDB Database designed to process time series data.
- Quick.Ref Cheat Sheets A collection of "cheat sheets"/quick-reference-guides for various tools and platforms and languages and ....
- RavenDB Document-oriented database in .NET.
- Raw storage model Thoughts and links.
- Realm (Realm Database, Realm Object Server) Object database intended as a replacement for SQLite and other mobile data storage systems.
- Redis An in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
- RedisGraph A graph database module for Redis.
- Reflow A language and runtime for distributed, incremental data processing in the cloud.
- Rel A desktop database management system that implements the "Tutorial D" database language.
- Relational storage model Thoughts/articles on the relational model of storage.
- Replicache An in-browser persistent key-value store that is git-like under the hood.
- Reshape A zero-downtime schema migration tool for Postgres.
- RestDB.io Simple online NoSQL database backend with automatic APIs and low code javascript hooks.
- RethinkDB Distributed document database.
- Riak Highly available, operationally simple, distributed database.
- Robomongo / Robo 3T Shell for SQL and MongoDB.
- RocksDB A library that provides an embeddable, persistent key-value store for fast storage.
- Rowy An open-source Airtable-like UI for your database and to build serverless cloud functions visually in the context of your data.
- RxDB A fast, offline-first, reactive database for JavaScript Applications.
- SchemaCrawler A relationaldatabase schema discovery and comprehension tool.
- ScrollSets All tabular knowledge can be stored in a single long plain text file.
- Semantic Web A model of the web based more around its founder's original intent (hyperlinks and full-state "documents").
- sharedb Realtime database backend based on Operational Transformation (OT).
- SimpleDB A "teaching" database implementation.
- Slashbase The open-source modern database IDE.
- Smile (data format) A binary data format that defines a binary equivalent of standard JSON data format.
- SML# A programming language in the ML-family; seamlessly integrates (currently a subset of) SQL.
- sones An object-orientated graph data storage for a large amount of highly connected semi-structured data in a distributed environment.
- Soul A SQLite REST and Realtime server.
- SPARQL A query language designed for semantic web data retrieval and update.
- Spatial storage model Thoughts and links.
- SQL.js (SQLite in JavaScript) A javascript library to run SQLite on the web.
- SQL (Structured Query Language) Standard language for querying (and updating) a relational database, originally intended for ad-hoc usage from an interactive prompt.
- SQLC Compile SQL to type-safe code; catch failures before they happen.
- SQLite A C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine.
- Starcounter "Fused" ACID memory centric database engine and C# VM.
- Storj Decentralized cloud storage.
- Structured and Unstructured Query Language (SUQL) Conversational Search over Structured and Unstructured Data with LLMs
- SubZero The All-in-One library suite for internal tools development with integrated authentication in your language of choice.
- Supabase Backend server with REST APIs to manage core backend needs.
- SurrealDB A single easy-to-use API for relational, document and graph databases.
- Swarm JavaScript replicated model (M of MVC) library.
- SymmetricDS Open-source database replication software that focuses on features and cross-platform compatibility.
- Tables A microlang for data science.
- TaffyDB JavaScript database.
- Taxi A language for describing API's and their models.
- TDengine Database designed to process time series data.
- Teach Yourself Computer Science A collection of links for learning CS for those who didn't study it at school.
- TerminusDB Knowledge graph and document store.
- TextBundle A file format for bundling text (Markdown) and binary (images, etc) in one file.
- Text storage model Thoughts and links.
- TimescaleDB Database designed to process time series data.
- Time-series storage model Thoughts and links on time-series models.
- TingoDB Embedded JavaScript MongoDB-compatible database for Node.js and node-webkit
- TinyBase A JavaScript library for structured state.
- Titan A scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.
- Trino Fast distributed SQL query engine for big data analytics that helps you explore your data universe.
- TypeDB A database with a rich and logical type system. TypeDB empowers you to solve complex problems, using TypeQL as its query language.
- UnQLite Embedded C-bindings document and key/value store.
- Vector storage model Thoughts and links.
- Vedis An embeddable datastore C library built with over 70 commands similar in concept to Redis but without the networking layer since Vedis run in the same process of the host application.
- VelocityDB C# .NET NoSQL Object Database, extended as Graph Database is VelocityGraph.
- VeloxDB A fast, object-oriented, open source in-memory database for C# with an emphasis on correctness.
- Visidata Data exploration at your fingertips.
- Vitess A scalable open-source database developed by Google to accommodate billions of YouTube users.
- VoltDB A data platform built to make your entire tech stack leaner, faster, and less expensive, so that your applications (and your company) can scale seamlessly to meet the ultra-low latency SLAs of 5G, IoT, edge computing, and whatever comes next.
- Vyne A single API, which automatically connects your services, databases, lambdas and queues.
- Weaver A scalable, fast, consistent graph store.
- Web Storage Mechanisms by which browsers can store key/value pairs, in a much more intuitive fashion than using cookies.
- WhiteDB A lightweight NoSQL database library written in C, operating fully in main memory.
- Wide-column/columnar storage model Thoughts and links.
- Xata Serverless database platform powered by PostgreSQL.
- YugaByteDB An open-source Postgres database loaded with all Postgres native features in a cloud-native environment.
Last modified 05 May 2025