Distribution and Distributed Systems

Distributed systems are the means by which programs on one machine communicate with another. Differ from architectures in that while an architecture may have a distributed system at its center, it's not a requirement, and while distributed systems may incorporate an architecture as part of their implementation, that architecture usually isn't imposed on the user of the distributed system. Includes networking topics, for example.

"Distributed System Models in the Real World": "There are three primary components that make up a complete system model: processes that execute operations, communication links that facilitate the passing of messages between processes, and timing assumptions that model the reliability and time bounds on operations and message passing within the system."

Concepts

"An introduction to distributed systems": "A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system."

Distributed programming book (Website): "Source repo for the book that I and my students in my course at Northeastern University, CS7680 Special Topics in Computing Systems: Programming Models for Distributed Computing, are writing on the topic of programming models for distributed systems."

This is a book about the programming constructs we use to build distributed systems. These range from the small, RPC, futures, actors, to the large; systems built up of these components like MapReduce and Spark. We explore issues and concerns central to distributed systems like consistency, availability, and fault tolerance, from the lens of the programming models and frameworks that the programmer uses to build these systems.

Notes on Distributed Systems for Young Bloods: Distributed systems are different because they fail often; Implement backpressure throughout your system; Find ways to be partially available; Use percentiles, not averages; Learn to estimate your capacity; Feature flags are how infrastructure is rolled out; Choose id spaces wisely; Writing cached data back to persistent storage is bad; Extract services.

Why Distributed Computing? - Jim Waldo
A Note on Distributed Computing - Waldo, Wollrath et al
On Designing and Deploying Internet Scale Services - James Hamilton
The Perils of Good Abstractions - Building the perfect API/interface is difficult
Chaotic Perspectives - Large scale systems are everything developers dislike - unpredictable, unordered and parallel
Stevey's Google Platforms Rant - Yegge's SOA platform experience
Latency Exists, Cope! - Commentary on coping with latency and it's architectural impacts
Latency - the new web performance bottleneck - not at all new (see Patterson), but noteworthy
The Tail At Scale - the latency challenges inherent of dealing with latency in large scale systems
The Byzantine Generals Problem
Distributed Computing Economics - Jim Gray
Rules of Thumb in Data Engineering - Jim Gray and Prashant Shenoy
Impossibility of distributed consensus with one faulty process - also known as FLP
Unreliable Failure Detectors for Reliable Distributed Systems A method for handling the challenges of FLP
Lamport Clocks - How do you establish a global view of time when each computer's clock is independent
Lazy Replication: Exploiting the Semantics of Distributed Services
Scalable Agreement - Towards Ordering as a Service
Scalable Eventually Consistent Counters over Unreliable Networks - Scalable counting is tough in an unreliable world
Models of Distributed Systems
Distributed Computing Models
Spire: A Cooperative, Phase-Symmetric Solution to Distributed Consensus

Pat Helland's works

Data on the Outside versus Data on the Inside - Pat Helland
Memories, Guesses and Apologies - Pat Helland
SOA and Newton's Universe - Pat Helland
Building on Quicksand - Pat Helland
Life Beyond Distributed Transactions - Helland
If you have too much data, then 'good enough' is good enough - NoSQL, Future of data theory - Pat Helland

CAP Theorem

CAP Conjecture - Consistency, Availability, Parition Tolerance cannot all be satisfied at once
Consistency, Availability, and Convergence - Proves the upper bound for consistency possible in a typical system
CAP Twelve Years Later: How the "Rules" Have Changed - Eric Brewer expands on the original tradeoff description
You Can't Sacrifice Partition Tolerance - Additional CAP commentary
Harvest, Yield and Scalable Tolerant Systems - Real world applications of CAP from Brewer et al

Vogels' works

Consistency and Availability - Vogels
Eventual Consistency - Vogels

Transactions and commits

Avoiding Two-Phase Commit - Two phase commit avoidance approaches
2PC or not 2PC, Wherefore Art Thou XA? - Two phase commit isn't a silver bullet
Starbucks doesn't do two phase commit - Asynchronous mechanisms at work
Optimistic Replication - Relaxed consistency approaches for data replication

Implementation abstraction concepts

Technology stacks/stack-related links:

Stack on a budget (Free-Tier Driven Development): A collection of services with great free tiers for developers on a budget.

Interesting links:

Public APIs: A collection of public APIs to use
Fuck Off As A Service
DnD 5e API
Another DnD 5e API
Poke API
My Little Pony: Friendship is Magic Episode information API
Streaming Movie of the Night API
Nessie, Capital One's Hackathon API: Nessie provides access to some real public-facing data such as Capital One ATM and bank branch locations, along with some mock customer account data, and was designed for use in experiments like this.
Gamestonk Terminal: Investment research for everyone Source (open-source alternative to Bloomberg)
State of APIs: Survey results on a yearly (2021, 2022, beyond?) basis.
JSONPlaceholder API
100 Free APIs for Developers in 2024

Interesting tools based on network APIs

imap-backup: Backup GMail (or other IMAP) accounts to disk

Auth-n-Auth and SSO

"Enabling user authentication in Swagger using Microsoft Identity"

Proprietary/Cloud

FusionAuth: User authentication and session management framework Source
Okta: Identity provider; bought up Auth0 a few years back

Open Source

Cerbos: Granular access control Source
Keycloak: User authentication and session management framework Source
OPAL (Permit.io): Authorization administration framework (Open Policy) Source
Ory: Identity platform Source
Oso: Authorization building framework Source
Supertokens: User authentication and session management framework Source

Ideas and theory

Languages and Tools

Programming Distributed Erlang Applications: Pitfalls and Recipes - Building reliable distributed applications isn't "just" using Erlang/OTP.

Infrastructure

Principles of Robust Timing over the Internet - Managing clocks is essential for even basics such as debugging

Storage

Paxos Consensus and other consensus papers

Paxos Made Simple - Leslie Lamport - read first
The Part-Time Parliament - Leslie Lamport
Paxos Made Live - An Engineering Perspective - Chandra et al
Revisiting the Paxos Algorithm - Lynch et al
How to build a highly available system with consensus - Butler Lampson
Reconfiguring a State Machine - Lamport et al - changing cluster membership
Implementing Fault-Tolerant Services Using the State Machine Approach: a Tutorial - Fred Schneider
Mencius: Building Efficient Replicated State Machines for WANs - consensus algorithm for wide-area network
In Search of an Understandable Consensus Algorithm - The extended version of the RAFT paper, an alternative to PAXOS.

Gossip Protocols (Epidemic Behaviours)

P2P

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications
Kademlia: A Peer-to-peer Information System Based on the XOR Metric
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems
PAST: A large-scale, persistent peer-to-peer storage utility - storage system atop Pastry
SCRIBE: A large-scale and decentralised application-level multicast infrastructure - wide area messaging atop Pastry

Amazon, Google

Reading--some tech, some culture

A Conversation with Werner Vogels - Coverage of Amazon's transition to a service-based architecture
Discipline and Focus - Additional coverage of Amazon's transition to a service-based architecture
Vogels on Scalability
SOA creates order out of chaos @ Amazon
MapReduce
Chubby Lock Manager
Google File System
BigTable
Data Management for Internet-Scale Single-Sign-On
Dremel: Interactive Analysis of Web-Scale Datasets
Large-scale Incremental Processing Using Distributed Transactions and Notifications
Megastore: Providing Scalable, Highly Available Storage for Interactive Services - Smart design for low latency Paxos implementation across datacentres.
Spanner - Google's scalable, multi-version, globally-distributed, and synchronously-replicated database.
Photon - Fault-tolerant and Scalable Joining of Continuous Data Streams. Joins are tough especially with time-skew, high availability and distribution.
Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing - Data warehousing system that stores critical measurement data related to Google's Internet advertising business.

Distributed Locking

For starters, don't do it. Even if you think you need to, don't.

"How to do distributed locking"

Some links on this page copied from A Distributed Systems Reading List

People

People on Fundamentals of Distributed Computing Theory

Leslie Lamport
The Writings of Leslie Lamport
Butler W. Lampson
Barbara Liskov
Nancy A. Lynch
Publications
> Distributed computing, real-time computing, algorithms, lower bounds,
formal modelling and verification, wireless network algorithms,
and biological distributed algorithms.
Hagit Attiya
Publications
> Distributed computation and theoretical computer science;
in particular: Fault-tolerance; Timing-based and asynchronous algorithms.
Faith Ellen
> My research is in the theory of distributed computing.
I particularly like proving lower bounds on the complexity of concrete problems
with the goal of understanding how parameters of various models of computation affect their computational power.
I also study data structures in distributed and sequential models.
Philipp Woelfel
- Theory of Distributed Computing
- Randomized Algorithms and Data Structures
- Computational Complexity
James Aspnes
Keren Censor-Hillel
Publications
> My research interests are mainly in distributed computing,
especially probabilistic algorithms and lower bounds,
and theory of computing in general.
Sergio Rajsbaum
> My research interests are in the theory of distributed computing,
especially issues related to coordination, complexity and computability.
Sam Toueg:
DBLP
Work on Failure Detectors
Joseph Y. Halpern
Joseph Y. Halpern's Publications
> My research focuses on the interface between game and decision theory and computer science,
on reasoning about knowledge and uncertainty, and on causality.

> I described myself as someone with a Ph.D. in mathematics,
who calls himself a computer scientist,
and is giving a talk to economists about a subject mainly studied by philosophers.
* Marcos K. Aguilera
> Theory and Practice of Distributed Systems
* Idit Keidar
* Gadi Taubenfeld
> My primary research interests are in concurrent and distributed computing.
* Jennifer Welch
* Michel Raynal
* Wojciech M. Golab
* Carole Delporte-Gallet
* DBLP
* Eli Gafni
* Sebastian Burckhardt

My research interest are programming models to program distributed, parallel,
or concurrent or systems conveniently, efficiently, and correctly.

People on Fundamentals of Multiprocessor Programming Theory

Victor Luchangco works in the Scalable Synchronization Group of Oracle Labs.
His research focuses on developing algorithms and mechanisms to support concurrent programming on large-scale distributed systems.

Mark Moir @ Oracle Labs

Moir's main research interests concern practical and theoretical aspects of concurrent, distributed, and real-time systems,
particularly hardware and software support for programming constructs
that facilitate scalable synchronization in shared memory multiprocessors.

Sarita V. Adve

On memory models.

People on Programming Languages (including Consistency Models and (Weak) Memory Models)

Alexey Gotsman

My research interests are in methods and tools for developing correct concurrent and distributed software.
* Sebastian Burckhardt
My research interests are programming models to program distributed, parallel, or concurrent or systems conveniently, efficiently, and correctly.
* Publications
* Marko Vukolić @ IBM Research
My research interests lie in the broad area of distributed systems.
Some more specific topics that I am (currently) interested in include fault-tolerance,
blockchain and distributed ledgers, cloud computing security and distributed storage.
* Jade Alglave
* Michael Emmi
Michael’s research enables the construction of reliable software by developing the foundations
for effective programming abstractions and informative program analysis tools.

People on the Theory of Distributed Systems

I am a software engineer at Google. Before that, I was a principal research scientist at Yahoo! Research. Before that I was an assistant professor at Georgia Tech, and before that I was a PhD student at Stanford.

Research Groups

TDS: Theory of Distributed Systems Group
Faculty: Nancy A. Lynch
Idit Keidar's Research Group
Google Research Group
Yahoo! Labs
The IMDEA Software Institute

Conferences, Journals, Workshops, and Magazines (By Topics)

SIGs:

SIGACT
SIGOPS
SIGMOD
SIGPLAN Special Interest Group on Programming Languages

General Theory of Computer Science

Conferences

The annual ALGO congress is the leading international gathering of researchers on Algorithms in Europe.

Journals

Distributed Computing Theory

Programming Languages and Concurrency Theory

Distributed Systems (and More General)

Formal Methods (Logic)

Database Systems

PODS: Symposium on Principles of Database Systems

Databases

Journals

Workshops

January 20 – 25 , 2013, Dagstuhl Seminar 13042
- And: the report from Dagstuhl Seminar 13042
- And the slides from Dagstuhl Seminar 13042
- Lower Bounds for Distributed Computing (09w5114)
This workshop will bring together experts in the field (and some exceptional graduate students and postdocs) to discuss fundamental distributed computing problems whose computational complexities have not been resolved and the limitations of current techniques for obtaining lower bounds for these problems.
- FuDiCo: Future Directions in Distributed Computing 2004; FuDiCo @ DBLP'2003

Prizes and Awards

The Edsger W. Dijkstra Prize in Distributed Computing
The Donald E. Knuth Prize for outstanding contributions to the foundations of computer science
The Godel Prize for outstanding papers in the area of theoretical computer science
- 2004 Godel Prize
  The discovery of the topological nature of distributed computing provides a new perspective on the area and represents one of the most striking examples, possibly in all of applied mathematics, of the use of topological structures to quantify natural computational phenomena.
- 2000 Godel Prize: Moshe Vardi and Pierre Wolper
  Reasoning about infinite computations: This paper is a reworking and extension of a conference contribution of FOCS'83, which has become a major reference in the automata-theoretic approach to temporal logic.
- 1997 Godel Prize: Joseph Halpern and Yoram Moses
  The Halpern-Moses paper provided a new and effective way of reasoning about distributed systems, providing rigorous and powerful new techniques based on epistemic logic.
Best Paper Award (maintained by Jeff Huang)
EAPLS Best PhD Dissertation Awards

The European Association for Programming Languages and Systems has established a Best Dissertation Award
in the international research area of programming languages and systems.

Reports

Dagstuhl Reports@DBLP

Courses & Paper Reading Lists

This course introduces the principles of distributed computing, emphasizing the fundamental issues underlying the design of distributed systems and networks: communication, coordination, fault-tolerance, locality, parallelism, self-organization, symmetry breaking, synchronization, uncertainty. We explore essential algorithmic ideas and lower bound techniques, basically the "pearls" of distributed computing.

6.824: Distributed Systems @ MIT

It will present abstractions and implementation techniques for engineering distributed systems. Topics include multithreading, remote procedure call, client/server designs, peer-to-peer designs, consistency, fault tolerance, and security, as well as several case studies of distributed systems.

The Lecture Notes (ppt) is elegant.
Topics will include the majority (we are going to shoot for all and see what happens) of the following: Global states and event ordering; Logical clocks; Vector clocks; Consistent cuts and global property detection; Rollback-recovery and message-logging protocols; State machine approach; Agreement protocols; Failure detectors; Replication and consistency; Byzantine fault tolerance; Atomic Commit

This course studies the organization of cloud computing systems and survey research problems in this area.

Big Ideas. Big Money. Big Data.

The primary emphasis is on operating systems and distributed systems. A secondary emphasis is on protocol implementation and next-generation network protocols. The focus when covering these topics is the extent to which they impact end-system design and implementation.

Lecture notes: Robust Concurrent Computing
It also provides a list of papers to read.

CPS 212: Distributed Information Systems@Duke University

CPS 212 is a graduate-level course dealing with techniques for storing and sharing information in computer networks, large and small. We will cover a range of core distributed systems topics, with an emphasis on the issues faced by networked utility services, scalable Internet services, and enterprise storage systems.

This class will examine file system implementation, low-level database storage techniques, and distributed programming. Lectures will cover basic file system structures, journaling and logging, I/O system performance, RAID, the RPC abstraction, and numerous systems illustrating these concepts.

CS 6464: Spring 2009 Advanced Distributed Storage Systems@Cornell University

This course broadly examines distributed storage systems in its many manifestations. It explores how to harness and maintain the collective storage capabilities in storage systems from global-scale enterprises and cloud computing to peer-to-peer, ad hoc, and home networks.

Principles, techniques, and examples related to the design, implementation, and analysis of distributed and parallel computer systems.

Computer Science Ph.D. Thesis

Tools

Abstraction-based parameterized TLA+ checker: Bringing state-of-the-art model checking to TLA+

Blogs

English

SYSLOG: the Cambridge System Research Group???s blog
All Things Distributed: Werner Vogels' weblog on building scalable and robust distributed systems.
Paper Trail: Wading through academic treacle
Perspectives: James Hamilton's Blog
CSE 708 Seminar on Distributed Systems and Distributed Computing
Umbrant by Andrew Wang
My name is Andrew Wang. I'm a software engineer at Cloudera on the HDFS team.
Metadata by Murat
I am a computer science and engineering professor at SUNY Buffalo. I work on distributed and networked systems and fault-tolerance.
High Scalability: Building bigger, faster, more reliable websits
Operational Dynamics on Storage
Riak on Distributed Storage
Why REST and JDBC are killing your data stack

Videos

From Leslie Lamport

What is Computation: Dr. Leslie Lamport, Microsoft
Thinking Above the Code
Architects draw detailed blueprints before a brick is laid or a nail is hammered. Programmers and software engineers seldom do. A blueprint for software is called a specification. The need for extremely rigorous specifications before coding complex or critical systems should be obvious—especially for concurrent and distributed systems. This talk explains why some sort of specification should be written for any software.

Books

Synthesis Lectures on Distributed Computing Theory

Synthesis Lectures on Distributed Computing Theory is edited by Jennifer Welch of Texas A&M University and Nancy Lynch of the Massachusetts Institute of Technology. The series publishes 50- to 150-page publications on topics pertaining to distributed computing theory. The scope largely follows the purview of premier information and computer science conferences, such as ACM PODC, DISC, SPAA, OPODIS, CONCUR, DialM-POMC, ICDCS, SODA, Sirocco, SSS, and related conferences. Potential topics include, but not are limited to: distributed algorithms and lower bounds, algorithm design methods, formal modeling and verification of distributed algorithms, and concurrent data structures.

Detail Pages:

"Serverless" The serverless architecture, usually implying smaller atoms of functionality executing on cloud servers.
0MQ (Zero-MQ) An open-source universal messaging library.
8Base A Serverless GraphQL Backend-as-a-Service.
Accengage European leader in Push Notification Technology for mobile apps & websites.
ActiveJ Java framework for modern web, cloud, high-load, and microservices solutions.
ActiveMQ The most popular open source, multi-protocol, Java-based message broker.
ActivityPub A decentralized social networking protocol based upon the ActivityStreams 2.0 data format; it provides a client to server API for creating, updating and deleting content, as well as a federated server to server API for delivering notifications and content.
ActivityStreams Details a model for representing potential and completed activities using the JSON format; it is intended to be used with vocabularies that detail the structure of activities, and define specific types of activities.
Actor model Notes and reading on the subject.
Aeron Efficient reliable UDP unicast, UDP multicast, and IPC message transport.
Agent-to-Agent (A2A) protocol An open standard designed to enable seamless communication and collaboration between AI agents.
AIAC (Artificial Intelligence Infrastructure-as-Code generator) A library and command line tool to generate IaC (Infrastructure as Code) templates, configurations, utilities, queries and more via LLM providers such as OpenAI, Amazon Bedrock and Ollama.
Akka A toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications.
Akka.NET A port of the Akka framework to the CLR.
Anytype A next-generation software that breaks down barriers between applications, gives back privacy and data ownership to users.
Apache Isis Provides apps with a standard, automatically generated UI.
Apache NiFi An easy to use, powerful, and reliable system to process and distribute data.
Apache Pulsar An open-source distributed pub-sub messaging system.
Apache Thrift A scalable cross-language services development that combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
APIHunt Explore Top APIs and Tech Resources. Discover the Building Blocks for Your Digital Transformation Journey.
Appboy Customer relationship suite for mobile marketers.
Armeria Microservice framework for building any type of microservice leveraging your favorite technologies (gRPC, Thrift, Kotlin, Retrofit, Reactive Streams, Spring Boot and Dropwizard).
ASP.NET/ASP.NET Core Microsoft's HTML/HTTP server-side framework.
Aspire An opinionated stack for building resilient, observable, and configurable cloud-native applications with .NET.
Async API (standard) Open source tools to easily build and maintain your event-driven architecture, powered by the AsyncAPI specification, the industry standard for defining ascychronous APIs.
Asynchronous Javascript And XML (AJAX) A term coined that describes an approach to using HTML or XHTML, CSS, JavaScript, DOM, XML, XSLT, and most importantly the XMLHttpRequest object to make quick, incremental updates to the UI without reloading the entire browser page.
ATProto An open, decentralized network for building social applications.
Aucoda A "meta-compiler" that can take Au source and transpile to a different application language/platform.
Authelia Single Sign-On Multi-Factor portal for web apps.
Awesome (lists) (Project Awesome) A curated collection of lists.
AxonIQ Event-driven platform.
Azure Messaging A collection of Azure messaging-related links.
Backendless A system of integrated, general purpose, mobile cloud services (such as: User Management, Data Persistence, Geolocation, Media Streaming, Publish/Subscribe Messaging, Push Notifications, Custom Business Logic, Analytics, Mobile Code Generation) with native and REST APIs.
Ballerina A service-oriented programming language.
Battlesnake A multi-player programming game played by developers all over the world. All you need to play is a live web server that implements the Battlesnake API.
Beehive An open source event and agent system, which allows you to create your own agents that perform automated tasks triggered by events and filters. It has modules to interface with, talk to, or retrieve information from Twitter, Tumblr, Email, IRC, Jabber, RSS, Jenkins, Hue etc.
Beej's Guide to Network Programming Intro to C-level sockets (TCP/IP, UDP/IP) programming.
Bigbone A Mastodon Client Library for Java and Kotlin.
Bloom A language for disorderly distributed programming with powerful consistency analysis and concise, familiar syntax.
Bolt protocol A highly efficient, lightweight client-server protocol designed for database applications.
bore bore is a simple CLI tool for making tunnels to localhost.
Cadence Fault-Tolerant Stateful Code Platform
Cadl A language for describing cloud service APIs and generating other API description languages, client and service code, documentation, and other assets.
CalDAV Defines extensions to the Web Distributed Authoring and Versioning (WebDAV) protocol to specify a standard way of accessing, managing, and sharing calendaring and scheduling information based on the iCalendar format.
Cap'n Proto, Cap'n Web New RPC implementations.
Cassandra An open source NoSQL distributed database trusted by thousands of companies for scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.
Castled Data synchronization framework focused on external apps.
Cherrybomb A CLI tool that helps you avoid undefined user behaviour by validating your API specifications.
Clemens Vasters' Messaging Repo Clemens' collection of messaging material and terminology.
Clio A pure functional lazy-evaluated programming language targeting decentralized and distributed systems.
Cloud Design Patterns A pattern catalog from Microsoft about how to architect cloud applications.
Cloud Events A common information model for events.
CloudMorph Decentralize and self-host cloud gaming/application.
Coherence A scalable, fault-tolerant, cloud-ready, distributed platform for building grid-based applications and reliably storing data.
CORBA (Common Object Request Broker Architecture) A Java/.NET interoperability-via-distribution platform.
CORBA (Common Object Request Broker Architecture) A distributed-objects platform, architecture, and specifications popular in the late 90s and early 2000s.
Core J2EE Patterns Patterns around enterprise Java circa 2000
CUE (Configure Unify Execute) Language Validate, define, and use dynamic and text-based data.
curl A command line tool and library for transferring data with URLs.
Dapr A portable, event-driven runtime that makes it easy for any developer to build resilient, stateless and stateful applications that run on the cloud and edge and embraces the diversity of languages and developer frameworks.
D-Bus (Desktop Bus) An Inter-Process Communication (IPC) and Remote Procedure Calling (RPC) mechanism specifically designed for efficient and easy-to-use communication between processes running on the same machine.
Derecho An open-source C++ distributed computing toolkit that provides strong forms of distributed coordination and consistency at RDMA speeds.
Designing Distributed Systems A relatively short distributed patterns book.
Distributed Application Specification Language (DASL) A language for formalizing distributed applications.
Dog/Doge A command-line DNS client.
Dolt It's Git for Data.
dotnetRDF An open-source CLR library for RDF.
DropboxMQ Filesystem-based JMS implementation.
DropWizard Pulls together stable, mature libraries from the Java ecosystem into a simple, light-weight package that lets you focus on getting things done.
E An object-oriented programming language for secure distributed computing.
Ecstasy (XTC) An object-ish language designed to be cloud native from the ground up.
edn (Extensible Data Notation) A format for serializing data.
Effective Enterprise Java A collection of topics for effective enterprise systems.
Ensemble The next generation of the Horus group communication toolkit.
Enterprise Integration Patterns Hohpe/Woolf's classic messaging text.
Event-Driven .NET An Event Driven Microservices Platform for .NET.
Extensible Markup Language (XML) Data or presentation format inspired by HTML and SGML, used extensively in software development.
Fallacies of Distributed Computing The set of mistakes that everybody makes when thinking about distributed systems.
Fallacies of Enterprise Computing My own collection of mistakes that everybody makes when building enterprise systems.
FIDO Authentication "The answer to the world's password problem."
Firebase A (pseudo)real-time NoSQL that also has platform capabilities.
Flogo An open source, resource efficient, Go-based ecosystem for building event-driven apps. It is designed to abstract the event processing paradigm from event consumption enabling devs to build once, consume from anywhere and process using any of the supported actions.
Flow-based Programming (fbp) A programming paradigm, discovered/invented by J. Paul Rodker Morrison in the late '60s, that uses a "data processing factory" metaphor for designing and building applications.
Fluid Framework A collection of client libraries for distributing and synchronizing shared state, allowing multiple clients to simultaneously create and operate on shared data structures using coding patterns similar to those used to work with local data.
Free Programming Books A collection of free learning resources (books).
Frenetic An open-source Software Defined Network (SDN) controller platform designed to make SDN programming easy, modular, and semantically correct, based on programming languages.
Fusionauth Auth-n-auth service.
FX Command-line tool and terminal JSON viewer.
Giraffe A functional ASP.NET Core micro web framework for building rich web applications.
Gland A lightweight, modular framework for event-driven applications.
Gland A lightweight, modular framework for event-driven applications.
GraphQL A query language APIs can support to enable SQL-like behavior across APIs; supports both query and mutation, and can define complex entity descriptions.
GraphQL Mesh Allows use of GraphQL query language to access data in remote APIs that don't run GraphQL (and also ones that do run GraphQL); can be used as a gateway to other services, or run as a local GraphQL schema that aggregates data from remote APIs.
Grouparoo Data synchronization framework.
gRPC A high-performance, open source universal RPC framework
Hasura Instant GraphQL on all your data.
Helidon A cloud-native, open‑source Java framework for writing microservices that run on a fast web core powered by Java virtual threads.
Hessian Binary web service protocol makes web services usable without requiring a large framework, and without learning yet another alphabet soup of protocols.
HexagonKt A microservices toolkit written in Kotlin, to ease the building of services (Web applications or APIs) that run inside a cloud platform.
HocusPocus A CRDT Yjs WebSocket backend for conflict-free real-time collaboration in your app.
http4k A simple and uniform way to serve, consume, and test HTTP services.
HTTP API Notes and links on the concept.
HTTPCats A collection of cat-themed graphics to go along with each of the HTTP status codes.
Hypertext Application Language (HAL) An open specification describing a generic structure for RESTful resources.
HyperText Transmission Protocol Backbone protocol for the World Wide Web.
Idempotence is not a Medical Condition Idempotence is an essential property for reliable systems; what is it, how do we do it, how can it go wrong, what to do about it.
IPFS (InterPlanetary FileSystem) protocol A peer-to-peer hypermedia protocol designed to preserve and grow humanity's knowledge by making the web upgradeable, resilient, and more open.
IRC (Internet Relay Chat) A chat protocol for the Internet.
Iron (IronMQ, IronWorker) Iron is Workers as a Service/Containers as a Service/Messaging Queue as a Service/Key-Value Store as a Service.
Ironfleet and Ironclad Proving practical distributed systems correct.
JavaEE (Enterprise Edition) A collection of specifications and implementations for the JavaEE platform.
Javalin A simple Web framework for Java and Kotlin.
Java RMI Java's Remote Method Invocation distributed objects platform.
Jetlang A high performance java threading library, based upon Retlang.
Jinaga Resilient, reliable, and connected web applications; auto-synchronization of immutable objects across presentation layers.
Jini (Apache River) A programming model to build adaptive network systems that are scalable, evolvable, and flexible.
Jolie A microservice/SOA programming language.
JSON Web Token (JWT) An open, industry standard (RFC 7519) method for representing claims securely between two parties.
Kafka The messaging/streaming platform.
Koa Next-generation web framework for node.js.
Ktor A framework for building asynchronous server-side and client-side Kotlin applications with ease.
Kubernetes An open source container orchestration engine for automating deployment, scaling, and management of containerized applications.
Lawrence Livermore National Labs (LLNL) Software portal for projects built at LLNL.
LittleHorse A high-performance microservice orchestration engine that allows developers to build scalable, maintainable, and observable applications.
Local-first A set of principles for software that enables both collaboration and ownership for users.
Marymoor Studios Core Libraries A set of tools for developing concurrent and distributed applications.
Message-passing Notes and reading on the subject.
Message passing (reading) A mechanism of communication or control flow in which a block of data is explicitly constructed and communicated to the recipient.
Message queuing Notes and reading on the subject.
Metawidget A smart widget that populates itself, either statically or at runtime, with UI components to match the properties of your domain objects.
Microservices (resources) A collection of resources and links on microservices.
Microservices DSL A Domain-Specific Language (DSL) to specify (micro-)service contracts, their data representations and API endpoints.
Mistral An experimental language for distributed programming, with an emphasis on the specification and maintenance of a virtual overlay network.
Mizu API traffic viewer for Kubernetes enabling you to view all API communication between microservices. Think TCPDump and Wireshark re-invented for Kubernetes
Model Control Protocol (MCP) An open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools.
Moirai-kt A scripting language that calculates the worst-case execution time (WCET) before executing each script.
Motoko Simple high-level language for writing Internet Computer canisters
MQTT A Client Server publish/subscribe messaging transport protocol.
MQTTnet A high performance .NET library for MQTT based communication. It provides a MQTT client and a MQTT server (broker).
N8N Node-based workflow automation tool for developers.
Naked Objects Framework Turns a POCO domain model (that follows a few simple conventions) into a complete application.
NATS A simple, secure and high performance open source messaging system for cloud native applications, IoT messaging, and microservices architectures.
Networking (TCP, UDP, IP, etc) A collection of notes and links on the subject.
NiFi A dataflow system based on the concepts of flow-based programming that supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic.
Nitric A framework for cloud and serverless apps.
Nomad A simple and flexible scheduler and workload orchestrator to deploy and manage containers and non-containerized applications across on-prem and clouds at scale.
Noms The versioned, forkable, syncable database.
Notifications Notes and reading on the subject.
OAM (Open Application Model) A set of standard yet higher level abstractions for modeling cloud native applications on top of today's hybrid and multi-cloud environments.
OASIS Advanced Message Queue Protocol (AMQP) The global standard enterprise messaging protocol.
OAuth Protocol for third-party authentication.
OAuth Specification for third-party authentication.
OData OASIS standard for data over HTTP-based communication.
Olin Olin is like the JVM for WebAssembly. It wraps WebAssembly with a set of functions to access the outside world and keeps track of things like how many instructions were used, how many syscalls were made and how much memory was used.
Opa An attempt at an "all-in-one" presentation, middleware, and storage system.
OpenAI Collection of AI services for natural language and images.
Open API Specification (OAS) A standard, programming language-agnostic interface description for REST APIs, which allows both humans and computers to discover and understand the capabilities of a service without requiring access to source code, additional documentation, or inspection of network traffic.
OpenCRUD A GraphQL CRUD API specification for databases.
Open Grid Forum An open global community committed to driving the rapid evolution and adoption of modern advanced applied distributed computing, including cloud, grid and associated storage, networking and workflow methods. OGF is focused on developing and promoting innovative scalable techniques, applications and infrastructures to improve productivity in the enterprise and within the international research, science and business communities.
OpenMAMA An open source project that provides a high performance middleware agnostic messaging API that interfaces with a variety of message oriented middleware systems.
Open-SaaS A free, open-source SaaS app starter for React & Node.js with superpowers.
OpenZiti The world’s most used and widely integrated open source secure networking platform, providing both zero trust security and overlay networking as pure open source software.
Orbit Virtual actor framework for building distributed systems.
Orleans Actors-based implementation for distributed systems implementation.
Owin Open Web Interface for .NET.
Palindrom Library for two-way data binding between local and remote JSON models using JSON-Patch for data updates and Operational Transformation for versioning and data consistency via HTTP or WebSocket or both.
Patterns of Distributed Systems Martin Fowler series, on distributed systems.
Patterns of Enterprise Application Architecture Martin Fowler's enterprise systems text.
PEAK Trellis Python event management without callbacks.
Pipedream Workflow automation and API integration platform.
Polly A .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner.
Prefect "Build, run, and monitor data pipelines at scale." Use a flexible Python framework to easily combine tasks into workflows, then deploy, schedule, and monitor their execution through the Prefect UI or API.
Present-RPC Simple, idiomatic RPCs for Java, Javascript, Android, iOS, and more
Project Tye An experimental developer tool that makes developing, testing, and deploying microservices and distributed applications easier.
Protocol Buffers a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.
pSpaces Implementations of Tuple Spaces (aka object spaces) across several languages/platforms.
Publish/subscribe (Pub-sub) Notes and reading on the subject.
PubNub Cloud platform for realtime message broadcast/consumption.
Quick.Ref Cheat Sheets A collection of "cheat sheets"/quick-reference-guides for various tools and platforms and languages and ....
Quint A modern specification language that is a particularly good fit for distributed systems.
Quix An end-to-end framework for real-time Python data engineering, operational analytics and machine learning on Apache Kafka data streams.
Ratpack A set of Java libraries for building scalable HTTP applications.
RDP (Remote Desktop Protocol) a network protocol developed by Microsoft that allows users to remotely access and interact with the graphical user interface of a remote Windows server.
Reactive Streams An initiative to provide a standard for asynchronous stream processing with non-blocking back pressure. This encompasses efforts aimed at runtime environments (JVM and JavaScript) as well as network protocols.
Really Simple Syndication (RSS) XML-based format for syndicated sites.
RealWorld Exemplary fullstack Medium.com clone powered by React, Angular, Node, Django, ... multiple front-ends, multiple back-ends, all operating off of the same API specification.
Reaqtor Reaqtor is a framework for reliable, stateful, distributed, and scalable event processing based on Rx.
Reflow A language and runtime for distributed, incremental data processing in the cloud.
Remmina Remote access screen and file sharing to your desktop.
Remote Procedure Call (RPC) Notes and reading on the subject.
Replicache An in-browser persistent key-value store that is git-like under the hood.
Representational State Transfer (REST) Collection of links and reading and notes on Fielding's dissertation.
Req A simple and opinionated HTTP scripting language.
Reso A resource-oriented programming language for building high-performance web services and APIs.
Resource Description Framework (RDF) A data format for Semantic Web involving triplets of data.
Rest# (Rest-Sharp) HTTP client library for the CLR.
REST Assured Java DSL for easy testing of REST services
resterm Terminal client for HTTP/GraphQL/gRPC with support for WebSockets, SSE, workflows, profiling, OpenAPI and response diffs.
Restful Objects A generic way to expose a domain model through a REST (or more precisely, hypermedia) API.
RestX The lightweight Java REST framework.
RevoltChat Open-source user-first chat service.
Rhodes (RhoMobile, RhoConnect, RhoStudio, RhoSuite, ...) Develop native cross-platform apps for iOS, Android, WinCE/WM, Windows Phone using web technologies and code on Ruby or JavaScript.
Rhovas A programming language for API design and enforcement.
RIFE2 A full-stack, no-declaration, framework to quickly and effortlessly create web applications with modern Java.
RocketMQ A unified messaging engine, lightweight data processing platform.
SAFE Stack Full-stack web development based on F# throughout each layer.
Scapy Python-based interactive packet manipulation program & library.
Scuttlebutt A decentralized platform.
Scuttlebutt (protocol) How Scuttlebutt (a decentralized peer-to-peer protocol) peers find and talk to each other.
Semantic Web A model of the web based more around its founder's original intent (hyperlinks and full-state "documents").
Service Design Patterns Collection of patterns specific to services.
Service Weaver Service Weaver is a programming framework for writing and deploying cloud applications from a single monolithic binary into microservices.
Shared spaces Notes and reading on the subject.
Silk Markdown-based document-driven RESTful API testing.
SimpleActors Wraps any Javascript object as an actor.
Simple Binary Encoding (SBE) Binary codec for several languages.
SmartMock An AI-powered API mock server built with Spring Boot, Ollama, and LangChain4j that, instead of serving hardcoded responses, uses an LLM to generate realistic, context-aware mock data directly from your OpenAPI specifications.
Smithy An extensible, typesafe, protocol agnostic, tool for defining services, resources, and operations, and powers services at AWS.
SOA Design Patterns A collection of patterns specific to service-orientation.
Soul A SQLite REST and Realtime server.
SPARQL A query language designed for semantic web data retrieval and update.
Spring Rod Johnson's "everything but EJB" application server based on dependency injection.
Spritely The next generation of decentralized networking technology.
Starcounter "Fused" ACID memory centric database engine and C# VM.
Static Site Generators list The definitive listing of Static Site Generators.
Steeltoe An open source project that provides a collection of libraries that helps users build production-grade cloud-native applications using externalized configuration, service discovery, distributed tracing, application management, and more.
Suave A simple web development F# library providing a lightweight web server and a set of combinators to manipulate route flow and task composition.
SubZero The All-in-One library suite for internal tools development with integrated authentication in your language of choice.
Supertokens Open source alternative to Auth0 / Firebase Auth / AWS Cognito.
Swagger to UML A small pure Python script to convert OpenAPI/Swagger specifications to PlantUML.
Swarm JavaScript replicated model (M of MVC) library.
Tangled A new social-enabled git collaboration platform built on atproto.
Tanstack Open-source software for web developers.
Teach Yourself Computer Science A collection of links for learning CS for those who didn't study it at school.
Temporal Workflows as code platform.
The Grand Unified Programming Theory: The Pure Function Pipeline Data Flow with Principle-based Warehouse/Workshop Model The Pure Function Pipeline Data Flow with Principle-based Warehouse/Workshop Model.
The many faces of publish/subscribe Notes from the paper.
The Reactive Manifesto "We believe that a coherent approach to systems architecture is needed, and we believe that all necessary aspects are already recognised individually: we want systems that are Responsive, Resilient, Elastic and Message Driven. We call these Reactive Systems."
Tiny Actor RunTime (tart) Tiny actor runtime.
Tiny Actor RunTime in Javascript (tartjs) JavaScript implementation of Tiny Actor Run-Time.
Titan A scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.
TogetherJS A free, open source JavaScript library by Mozilla that adds collaboration features and tools to your website.
tRPC Typescript-centric client/server RPC over HTTP.
tyk Open source Enterprise API Gateway, supporting REST, GraphQL, TCP and gRPC protocols.
UCI (Universal Chess Interface) An open communication protocol that enables chess engines to communicate with user interfaces.
Ur (Ur/Web) A programming language in the tradition of ML and Haskell, but featuring a significantly richer type system. Ur is functional, pure, statically typed, and strict. Ur supports a powerful kind of metaprogramming based on row types.
Vert.x Reactive applications on the JVM.
VSync A new option for cloud computing that can enable reliable, secure replication of data even in the highly elastic first-tier of the cloud.
wasmCloud A distributed platform for writing portable business logic in Rust and compiled to WebAssembly.
Weaver A scalable, fast, consistent graph store.
WebDAV A open protocol for Distributed Authoring and Versioning over the Web (HTTP).
Webhooks Servers-calling-client callbacks across the Web.
WebIDL An interface description language (IDL) format for describing application programming interfaces (APIs) that are intended to be implemented in web browsers.
WebSockets A computer communications protocol, providing full-duplex communication channels over a single TCP connection.
World Wide Web Consortium (W3C) An international community that develops open standards to ensure the long-term growth of the Web.
XMLHttpRequest Enables Web browsers to issue HTTP requests from within a page's lifecycle and consume the result.
XMQ A configuration language, data-storage language, logging language compatible with xml/html and json.
XOOM DDD naked objects-like platform for low-code to full-code reactive, event-driven apps and microservices.
Yaade An open-source, self-hosted, collaborative API development environment.
Yaak Call REST, GraphQL, and gRPC APIs from a simple and intuitive app.
Yjs A CRDT implementation that exposes its internal data structure as shared types, common data types like Map or Array with superpowers--changes are automatically distributed to other peers and merged without merge conflicts.
ZeroC Comprehensive RPC framework with support for C++, C#, Java, JavaScript, Python and more.

Last modified 06 December 2025

Distribution and Distributed Systems

Concepts

Implementation abstraction concepts

Technology stacks/stack-related links:

Interesting links:

Interesting tools based on network APIs

Auth-n-Auth and SSO

Proprietary/Cloud

Open Source

Ideas and theory

Languages and Tools

Infrastructure

Storage

Paxos Consensus and other consensus papers

Gossip Protocols (Epidemic Behaviours)

P2P

Amazon, Google

Distributed Locking

People

People on Fundamentals of Distributed Computing Theory

People on Fundamentals of Multiprocessor Programming Theory

People on Programming Languages (including Consistency Models and (Weak) Memory Models)

People on the Theory of Distributed Systems

Research Groups

Conferences, Journals, Workshops, and Magazines (By Topics)

SIGs:

General Theory of Computer Science

Conferences

Journals

Distributed Computing Theory

Programming Languages and Concurrency Theory

Distributed Systems (and More General)

Formal Methods (Logic)

Database Systems

Databases

Journals

Workshops

Prizes and Awards

Reports

Courses & Paper Reading Lists

Computer Science Ph.D. Thesis

Tools

Blogs

English

Other Articles

Videos

From Leslie Lamport

Books

Detail Pages: