Storage is typically data, but the degree of structure is flexible. Unstructured storage is essentially a filesystem (although even then filesystems have structure to them). Structured storage is often called a "database system" in some particular shape.
Some various data sources
Link sites
"Shapes" to data
Built, more or less, on the Codd model of relationships between tuples of data.
Some interesting relational-oriented sites
Structure is conceptually "star"-like, with minimal (or no) relationships outside of the document recognized by the storage system. (Developers can, and usually will, store unique data elements across documents as a way of putting structure in at the application level, but this is typically unrecognized by the storage system itself.)
Wikipedia
"A graph is a structure composed of vertices and edges. Both vertices and edges can have an arbitrary number of key/value-pairs called properties. Vertices denote discrete objects such as a person, a place, or an event. Edges denote relationships between vertices. For instance, a person may know another person, have been involved in an event, and/or have recently been at a particular place. Properties express non-relational information about the vertices and edges. Example properties include a vertex having a name and an age, and an edge having a timestamp and/or a weight.
"If a user's domain is composed of a heterogeneous set of objects (vertices) that can be related to one another in a multitude of ways (edges), then a graph may be the right representation to use. In a graph, each vertex is seen as an atomic entity (not simply a "row in a table") that can be linked to any other vertex or have properties added or removed at will. This empowers the data modeler to think in terms of actors within a world of complex relations as opposed to, in relational databases, statically-typed tables joined in aggregate. Once a domain is modeled, that model must then be exploited in order to yield novel, differentiating information. Graph computing has a rich history that includes not only query languages devoid of table-join semantics, but also algorithms that support complex reasoning: path analysis, vertex clustering and ranking, subgraph identification, and more. The world of applied graph computing offers a flexible, intuitive data structure along with a host of algorithms able to effectively leverage that structure." -- from Apache TinkerPop
List of graph dbs to add (from TinkerPop page):
- Alibaba Graph Database - A real-time, reliable, cloud-native graph database service that supports property graph model.
- Amazon Neptune - Fully-managed graph database service.
- Bitsy - A small, fast, embeddable, durable in-memory graph database.
- Blazegraph - RDF graph database with OLTP support.
- ChronoGraph - A versioned graph database.
- DSEGraph - DataStax graph database with OLTP and OLAP support.
- GRAKN.AI - Distributed OLTP/OLAP knowledge graph system.
- Hadoop (Spark) - OLAP graph processor using Spark.
- HGraphDB - OLTP graph database running on Apache HBase.
- Huawei Graph Engine Service - Fully-managed, distributed, at-scale graph query and analysis service that provides a visualized interactive analytics platform.
- IBM Graph - OLTP graph database as a service.
- neo4j-gremlin-bolt - OLTP graph database (using Bolt Protocol).
- Apache S2Graph - OLTP graph database running on Apache HBase.
- Sqlg - OLTP implementation on SQL databases.
- Stardog - RDF graph database with OLTP and OLAP support.
- TinkerGraph - In-memory OLTP and OLAP reference implementation.
- Unipop - OLTP Elasticsearch and JDBC backed graph.
Topology options
One makes network calls to access the storage engine. Most storage engines follow this model, whether inside of the same network (a la "on-prem") or cloud.
The storage engine is access in-process inside of the using program. Often cannot be accessed by other running programs. Often managing files directly, and the storage engine shuts down when the host process does. Excellent for standalone, self-contained installations that have no external dependencies beyond the fileystem. Fastest of all the relationships, with possible exception of code hosted inside the database (a la stored procedures).
Some storage engines also allow for code-hosting, in which code executes inside the same process(es) as the storage engine itself, a la "stored procedures".
Automation
- "Database Gyms": "In the past decade, academia and industry have embraced machine learning (ML) for database management system (DBMS) automation. These efforts have focused on designing ML models that predict DBMS behavior to support picking actions (e.g., building indexes) that improve the system’s performance. Recent developments in ML have created automated methods for finding good models. Such advances shift the bottleneck from DBMS model design to obtaining the training data necessary for building these models. But generating good training data is challenging and requires encoding subject matter expertise into DBMS instrumentation.
"Existing methods for training data collection are bespoke to individual DBMS components and do not account for (1) how workload trends affect the system and (2) the subtle interactions between internal system components. Consequently, the models created from this data do not support holistic tuning across subsystems and require frequent retraining to boost their accuracy.
"This paper presents the architecture of a database gym, an integrated environment that provides a unified API of pluggable components for obtaining high-quality training data. The goal of a database gym is to simplify ML model training and evaluation to accelerate autonomous DBMS research. But unlike gyms in other domains that rely on custom simulators, a database gym uses the DBMS itself to create simulation environments for ML training. Thus, we discuss and prescribe methods for overcoming challenges in DBMS simulation, which include demanding requirements for performance, simulation fidelity, and DBMS-generated hints for guiding training processes."
Datamining
Information Retrieval
Storage and retrieval
"Don't use your ORM entities for everything--embrace the SQL!":
Implementation
- Mini-LSM: Build a simple key-value storage engine in a week. Extend it in the second and third weeks.
- LibraDB: "... a simple, persistent key/value store written in pure Go. The project aims to provide a working yet simple example of a working database."
- simpledb: A simple database built from scratch that has some the basic RDBMS features (SQL query parser, transactions, query optimizer)
- C: Let's Write a Database (a SQLite clone in C)
- C++: Build Your Own Redis from Scratch
- C#: Build Your Own Database
- Clojure: An Archaeology-Inspired Database
- Crystal: Why you should build your own NoSQL Database
- Go: Build Your Own Database from Scratch: Persistence, Indexing, Concurrency
- Go: Build Your Own Redis from Scratch
- Go: gosqldb: A key-value persistent database that supports SQL queries over B+ and LSM trees
- JavaScript: Dagoba: an in-memory graph database
- Python: DBDB: Dog Bed Database
- Python: Write your own miniature Redis with Python
- Ruby: Build your own fast, persistent KV store in Ruby
- Rust: Build your own Redis client and server
- Rust: YourSQL
- Rust: OxidSQL
- Rust: erdb: An educational relational database
- Subreddit: /r/databasedevelopment
- B-Tree Implementation
- The SimpleDB Data System: "... a multi-user transactional database server written in Java, which interacts with Java client programs via JDBC. The system is intended for pedagogical use only. The code is clean and compact. The APIs are straightforward. The learning curve is relatively small. Everything about it is geared towards improving the experience of a database system internals course. Consequently, the system is intentionally bare-bones. It implements only a small fraction of SQL and JDBC, and does little or no error checking. The SimpleDB code is an integral part of my textbook Database Design and Implementation, published by Springer."
- Building a NoSQL database from zero
- "How to build a relational database from scratch" (Medium members only)
- Build a NoSQL database from scratch in 1000 lines of code (in Go)
- Building BerkeleyDB
Detail Pages:
- 4store RDF database
- Actian Actian NoSQL (formerly Versant) object technology enables software developers to handle database requirements for extremely complex object models with ease and is used by the world’s largest companies for applications with very large scale data management requirements. Actian NoSQL doesn’t need mapping code to store or retrieve objects, so schema modifications can be handled without application downtime. Fault tolerance, synchronous and asynchronous replication, high availability and excellent scalability make Actian NoSQL ready for the enterprise.
- Adama RDF database
- Akavache An asynchronous, persistent key-value store created for writing desktop and mobile applications, based on SQLite3.
- Amazon Dynamo A fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. DynamoDB lets you offload the administrative burdens of operating and scaling a distributed database so that you don't have to worry about hardware provisioning, setup and configuration, replication, software patching, or cluster scaling.
- Amazon S3 (and Glacier) Storage for the internet.
- Apache.org The Apache site is a collection of numerous open-source projects, in all stages of life (incubating, maintained, archived).
- Apache Derby Full SQL-compliant RDBMS written in Java.
- Apache Jena A free and open source Java framework for building Semantic Web and Linked Data applications.
- Apache TinkerPop A graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP).
- Appwrite An open-source platform that provides web and mobile developers with a set of easy-to-use and integrate REST APIs to manage their core backend needs.
- ArangoDB Multi-model database
- ArcadeDB a conceptual fork of OrientDB, a multi-model database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis; also supports Vector Embeddings.
- Awesome (lists) (Project Awesome) A curated collection of lists.
- BBolt An embedded key/value database for Go.
- BerkeleyDB Embedded file-based storage system. Sort of a SQLite before SQLite was popular.
- Bolt An embedded key/value database for Go.
- Bolt protocol A highly efficient, lightweight client-server protocol designed for database applications.
- Bond An open-source, cross-platform framework for working with schematized data, supporting cross-language serialization/deserialization and powerful generic mechanisms for efficiently manipulating data.
- BrightstarDB A native RDF database for the .NET platform.
- Build Your Own redis Learn network programming and data structures by coding from scratch.
- Build-Your-Own-X A collection of links on how to build various things as a learning exercise.
- Byzer A low-code open-source programming language for data pipeline, analytics and AI.
- Cassandra An open source NoSQL distributed database trusted by thousands of companies for scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.
- Castled Data synchronization framework focused on external apps.
- Cayley An open-source database for Linked Data inspired by the graph database behind Google's Knowledge Graph (formerly Freebase).
- Citus All the greatness of Postgres, plus the superpower of distributed tables. 100% open source.
- Clipper (aka xBase) SQL-inspired language from dBase days.
- Cloudboost Cloud platform for your app; similar to Parse + Firebase + Algolia + Iron.io all combined into one.
- Cloudflare Durable Objects (D1) Serverless SQL databases to query from your Workers and Pages projects.
- CockroachDB A prime example of NewSQL with ACID guarantee of SQL and the scale of NoSQL. CockroachDB is open source and freely available for cloud-native and hybrid deployments. SQL. Geo-location of data. Horizonatal Scale. ACID transactions.
- Coherence A scalable, fault-tolerant, cloud-ready, distributed platform for building grid-based applications and reliably storing data.
- CosmosDB Microsoft's proprietary globally-distributed, multi-model database service "for managing data at planet-scale" launched in May 2017. It is schema-agnostic, horizontally scalable and generally classified as a NoSQL database.
- Couchbase A full-featured, multi-service, multimodel database.
- CouchDB Seamless multi-master sync documented-oriented native HTTP/JSON API with emphasis on reliability and concurrency.
- CQL (Cassandra Query Language) Query language that's almost SQL that rides on top of the Cassandra storage system.
- CreateDB Open-source, distributed SQL database.
- Cypher A graph database query language first popularized by Neo4j.
- D (Tutorial D) D is a set of prescriptions for what Christopher J. Date and Hugh Darwen believe a relational database management system ought to be like.
- Dapper A simple object mapper for .NET.
- Database/Data storage implementation Resources on how to build a database
- Data concurrency and parallel programming Links about different ways to store data at high levels of concurrency
- Data Format Description Language (DFDL) A language for describing text and binary data formats.
- Datasets A collection of data sets to use for teaching purposes.
- db4o An object-oriented database using "native queries" (queries in the source language).
- DBeaver Free universal database tool and SQL client.
- DBOS A Database-Oriented Operating System.
- DBOS A Typescript framework built on the database that helps you develop transactional backend applications.
- Designing Data-Intensive Applications How to think about building data-centric applications.
- DGraph Native GraphQL Database with graph backend.
- Directus Headless CMS & API for Custom Databases.
- Dolt It's Git for Data.
- dotnetRDF An open-source CLR library for RDF.
- DragonflyDB An in-memory datastore.
- DuckDB An in-process SQL OLAP database management system.
- DyBase A very simple object oriented embedded database for languages with dynamic type checking.
- dynomite A generic dynamo implementation for different k-v storage engines.
- EdgeDB Open source, graph-relational database.
- Entity Framework A framework for retrieving and storing data to relational systems.
- esProc SPL A scripting language for data processing, executed in a Java progrma through JDBC.
- EventStoreDB Persist your application data as streams of events with an open-source database, the best data storage solution for event-sourced systems.
- Exograph A powerful data modeling language and high-performance query engine that offers dynamically generated APIs.
- Extensible-Storage-Engine (ESE, aka "Jet") An embedded/ISAM-based database engine, that provides rudimentary table and indexed access.
- eXtremeDB Hybrid Persistent & In-memory Highly Scalable Distributed Client / Server
- FaunaDB A global serverless database that gives you ubiquitous, low latency access to app data, without sacrificing data correctness and scale.
- Filestash A file manager that let you manage your data anywhere it is located.
- Firebase A (pseudo)real-time NoSQL that also has platform capabilities.
- Fluid Framework A collection of client libraries for distributing and synchronizing shared state, allowing multiple clients to simultaneously create and operate on shared data structures using coding patterns similar to those used to work with local data.
- Flyway Relational(ish) database evolution and migration tool.
- Free Programming Books A collection of free learning resources (books).
- GDL (GNU Data Language) A domain-specific data analysis and visualization programming language and a data analysis environment.
- Geode Apache Geode is a data management platform that provides real-time, consistent access to data-intensive applications throughout widely distributed cloud architectures. (Formerly GemStone GemFire.)
- GraphDB(Lite) Fully Featured RDF Database for Massive Data and Moderate Query Loads.
- GraphEngine A distributed in-memory data processing engine, underpinned by a strongly-typed RAM store and a general distributed computation engine.
- GraphQL A query language APIs can support to enable SQL-like behavior across APIs; supports both query and mutation, and can define complex entity descriptions.
- GraphQL Mesh Allows use of GraphQL query language to access data in remote APIs that don't run GraphQL (and also ones that do run GraphQL); can be used as a gateway to other services, or run as a local GraphQL schema that aggregates data from remote APIs.
- Grouparoo Data synchronization framework.
- Harbour Portable, xBase compatible programming language and environment.
- HBase Datastore for Apache "big data" projects.
- HyperGraphDB A general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes.
- HyperSQL Relational database written in Java.
- iBoxDB Fast ACID Table-Style Document NoSQL Database.
- IndexedDB A low-level API for client-side storage of significant amounts of structured data, including files/blobs. This API uses indexes to enable high-performance searches of this data.
- InfiniteGraph Advanced Graph Database and Analytics for Enterprises and Government.
- InfluxDB Platform for building time series applications.
- InnoDB A storage engine for the database management system MySQL and MariaDB.
- JanusGraph A scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.
- JavaScript Object Notation (JSON) Object literal syntax from ECMAScript that's since taken off as a data storage and wire transfer format.
- JSDB (JavaScript for Databases) JSDB is JavaScript for databases, a scripting language for data-driven, network-centric programming on Windows, Mac, Linux, and SunOS; it works with databases, XML, the web, and email, as a JavaScript shell, to run CGI programs, or as a web server.
- JsonDB A Opensource, Java-based, database.
- JunoDB PayPal's home-grown secure, consistent and highly available key-value store providing low, single digit millisecond, latency at any scale.
- JVM (Platform) Data-related Data storage and access on the JVM.
- Keyv A consistent interface for key-value storage across multiple backends via storage adapters. It supports TTL based expiry, making it suitable as a cache or a persistent key-value store.
- Kinto A generic JSON document store with sharing and synchronisation capabilities.
- Knowledge Graph Language (KGL) A query language for exploring knowledge graphs.
- Kusto (query language) Query language for use with Azure Data Explorer.
- kvass A personal key-value data store.
- LevelDB A very fast and lightweight, embedded database; part of the Chrome browser (exposed as IndexedDB).
- libSQL A next-gen fork of SQLite.
- linq2db LINQ to database provider.
- LiteDB An open source MongoDB-like database with zero configuration - mobile ready.
- Logica A logic programming language that compiles to StandardSQL and runs on Google BigQuery.
- LokiJS Super fast in-memory javascript document oriented database.
- M (MUMPS: Massachusetts General Hospital Utility Multi-Programming System) A procedural language with a built-in NoSQL database; or, it’s a database with an integrated language optimized for accessing and manipulating that database.
- MapDB Provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
- MariaDB Open-source relational database.
- Marten Transactional Document DB and Event Store on PostgreSQL.
- MassiveJS A data mapper for Node.js that goes all in on PostgreSQL and embraces the power and flexibility of SQL and the relational model.
- Memgraph A streaming graph application platform that helps you wrangle your streaming data, build sophisticated models that you can query in real-time.
- Metadata-Connect A massively scalable metadata (data about the data) management platform hosted in the cloud.
- Microsoft SQL Server Microsoft's relational database implementation.
- Minio S3 compatible object storage.
- MongoDB Document-oriented network storage.
- Multi-Dimensional and Hierarchical Database Toolkit A Linux-based, open sourced, C/C++ toolkit of portable software that supports fast, flexible, multi-dimensional and hierarchical storage, retrieval and manipulation of information in data bases ranging in size up to 256 terabytes.
- NDatabase OODBMS native and transparent persistence for .NET.
- Nebula Graph The graph database built for super large-scale graphs with milliseconds of latency.
- NeDB The JavaScript Database, for Node.js, nw.js, electron and the browser.
- Neo4J Industry standard for graph-oriented database.
- Nextcloud A personal cloud which runs on your own server.
- Nitrite An open source nosql embedded document store written in Java with MongoDB like API that supports both in-memory and single file based persistent store.
- NMemory A lightweight non-persistent in-memory relational database engine that is purely written in C# and can be hosted by .NET applications.
- Nocobase An open source and free no-code development platform.
- NocoDB The Open Source Airtable alternative.
- Noms The versioned, forkable, syncable database.
- NosDB A 100% native .NET Open Source NoSQL Database, extremely fast and linearly scalable and allows your .NET applications to handle extreme transaction loads (XTP).
- NuoDB Distributed SQL database.
- ObjectDB A powerful Object-Oriented Database Management System (ODBMS) whose native API is the Java Persistence API (JPA).
- Objectivity/DB A scalable, high performance, distributed Object Database (ODBMS).
- OrientDB Multidimensional open-source database.
- OrioleDB A new storage engine for PostgreSQL, bringing a modern approach to database capacity, capabilities and performance to the world's most-loved database platform.
- Orly An open-source graph database
- Owncloud A personal cloud which runs on your own server.
- Papers We Love A collection of academic papers gathered by a popular "user group" dedicated to reading through them.
- Perst Open source, dual license, object-oriented embedded database system (ODBMS).
- PingCAP NewSQL database that supports HTAP workloads.
- PlanetScale A serverless MySQL platform based on the Vitess horizontal scaling MySQL technology.
- PostgresQL Open-source relational database.
- PouchDB Open-source JavaScript database inspired by CouchDB.
- Presto (PrestoDB, PrestoSQL) Open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes.
- Prisma An auto-generated and fully type-safe database client providing a simplistic yet extremely powerful API.
- Probase The goal of Probase is to make machines “aware” of the mental world of human beings, so that machines can better understand human communication. We do this by giving certain general knowledge or certain common sense to machines.
- PRQL (Pipelined Relational Query Language) A modern language for transforming data — a simple, powerful, pipelined SQL replacement.
- Puter Cloud-hosted OS-like platform.
- QuestDB Database designed to process time series data.
- Quick.Ref Cheat Sheets A collection of "cheat sheets"/quick-reference-guides for various tools and platforms and languages and ....
- RavenDB Document-oriented database in .NET.
- Realm (Realm Database, Realm Object Server) Object database intended as a replacement for SQLite and other mobile data storage systems.
- Redis An in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
- RedisGraph A graph database module for Redis.
- Reflow A language and runtime for distributed, incremental data processing in the cloud.
- Rel A desktop database management system that implements the "Tutorial D" database language.
- Relational Model Thoughts/articles on the relational model of storage.
- Replicache An in-browser persistent key-value store that is git-like under the hood.
- Reshape A zero-downtime schema migration tool for Postgres.
- RestDB.io Simple online NoSQL database backend with automatic APIs and low code javascript hooks.
- RethinkDB Distributed document database.
- Riak Highly available, operationally simple, distributed database.
- Robomongo / Robo 3T Shell for SQL and MongoDB.
- RocksDB A library that provides an embeddable, persistent key-value store for fast storage.
- Rowy An open-source Airtable-like UI for your database and to build serverless cloud functions visually in the context of your data.
- RxDB A fast, offline-first, reactive database for JavaScript Applications.
- SchemaCrawler A relationaldatabase schema discovery and comprehension tool.
- ScrollSets All tabular knowledge can be stored in a single long plain text file.
- Semantic Web A model of the web based more around its founder's original intent (hyperlinks and full-state "documents").
- sharedb Realtime database backend based on Operational Transformation (OT).
- SimpleDB A "teaching" database implementation.
- Slashbase The open-source modern database IDE.
- SML# A programming language in the ML-family; seamlessly integrates (currently a subset of) SQL.
- sones An object-orientated graph data storage for a large amount of highly connected semi-structured data in a distributed environment.
- SPARQL A query language designed for semantic web data retrieval and update.
- SQL.js (SQLite in JavaScript) A javascript library to run SQLite on the web.
- SQL (Structured Query Language) Standard language for querying (and updating) a relational database, originally intended for ad-hoc usage from an interactive prompt.
- SQLite A C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine.
- Starcounter "Fused" ACID memory centric database engine and C# VM.
- Storj Decentralized cloud storage.
- Structured and Unstructured Query Language (SUQL) Conversational Search over Structured and Unstructured Data with LLMs
- Supabase Backend server with REST APIs to manage core backend needs.
- SurrealDB A single easy-to-use API for relational, document and graph databases.
- Swarm JavaScript replicated model (M of MVC) library.
- SymmetricDS Open-source database replication software that focuses on features and cross-platform compatibility.
- Tables A microlang for data science.
- TaffyDB JavaScript database.
- Taxi A language for describing API's and their models.
- TDengine Database designed to process time series data.
- Teach Yourself Computer Science A collection of links for learning CS for those who didn't study it at school.
- TerminusDB Knowledge graph and document store.
- TextBundle A file format for bundling text (Markdown) and binary (images, etc) in one file.
- TimescaleDB Database designed to process time series data.
- TingoDB Embedded JavaScript MongoDB-compatible database for Node.js and node-webkit
- TinyBase A JavaScript library for structured state.
- Titan A scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.
- Trino Fast distributed SQL query engine for big data analytics that helps you explore your data universe.
- TypeDB A database with a rich and logical type system. TypeDB empowers you to solve complex problems, using TypeQL as its query language.
- UnQLite Embedded C-bindings document and key/value store.
- Vedis An embeddable datastore C library built with over 70 commands similar in concept to Redis but without the networking layer since Vedis run in the same process of the host application.
- VelocityDB C# .NET NoSQL Object Database, extended as Graph Database is VelocityGraph.
- VeloxDB A fast, object-oriented, open source in-memory database for C# with an emphasis on correctness.
- Vitess A scalable open-source database developed by Google to accommodate billions of YouTube users.
- VoltDB A data platform built to make your entire tech stack leaner, faster, and less expensive, so that your applications (and your company) can scale seamlessly to meet the ultra-low latency SLAs of 5G, IoT, edge computing, and whatever comes next.
- Vyne A single API, which automatically connects your services, databases, lambdas and queues.
- Weaver A scalable, fast, consistent graph store.
- Web Storage Mechanisms by which browsers can store key/value pairs, in a much more intuitive fashion than using cookies.
- WhiteDB A lightweight NoSQL database library written in C, operating fully in main memory.
- Xata Serverless database platform powered by PostgreSQL.
- YugaByteDB An open-source Postgres database loaded with all Postgres native features in a cloud-native environment.
Last modified 02 October 2024