Distributed systems are the means by which programs on one machine communicate with another. Differ from architectures in that while an architecture may have a distributed system at its center, it's not a requirement, and while distributed systems may incorporate an architecture as part of their implementation, that architecture usually isn't imposed on the user of the distributed system. Includes networking topics, for example.

Concepts

"An introduction to distributed systems": "A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system."

Distributed programming book (Website): "Source repo for the book that I and my students in my course at Northeastern University, CS7680 Special Topics in Computing Systems: Programming Models for Distributed Computing, are writing on the topic of programming models for distributed systems."

This is a book about the programming constructs we use to build distributed systems. These range from the small, RPC, futures, actors, to the large; systems built up of these components like MapReduce and Spark. We explore issues and concerns central to distributed systems like consistency, availability, and fault tolerance, from the lens of the programming models and frameworks that the programmer uses to build these systems.

Notes on Distributed Systems for Young Bloods: Distributed systems are different because they fail often; Implement backpressure throughout your system; Find ways to be partially available; Use percentiles, not averages; Learn to estimate your capacity; Feature flags are how infrastructure is rolled out; Choose id spaces wisely; Writing cached data back to persistent storage is bad; Extract services.

Pat Helland's works

CAP Theorem

Vogels' works

Transactions and commits

Implementation abstraction concepts

Technology stacks/stack-related links:

Interesting links:

Interesting tools based on network APIs

Auth-n-Auth and SSO

Proprietary/Cloud

Open Source

Ideas and theory

Languages and Tools

Infrastructure

Storage

Paxos Consensus and other consensus papers

Gossip Protocols (Epidemic Behaviours)

P2P

Amazon, Google

Reading--some tech, some culture

Distributed Locking

For starters, don't do it. Even if you think you need to, don't.

"How to do distributed locking"

Some links on this page copied from A Distributed Systems Reading List

People

People on Fundamentals of Distributed Computing Theory

> I described myself as someone with a Ph.D. in mathematics,
who calls himself a computer scientist,
and is giving a talk to economists about a subject mainly studied by philosophers.
* Marcos K. Aguilera
> Theory and Practice of Distributed Systems
* Idit Keidar
* Gadi Taubenfeld
> My primary research interests are in concurrent and distributed computing.
* Jennifer Welch
* Michel Raynal
* Wojciech M. Golab
* Carole Delporte-Gallet
* DBLP
* Eli Gafni
* Sebastian Burckhardt

My research interest are programming models to program distributed, parallel,
or concurrent or systems conveniently, efficiently, and correctly.

People on Fundamentals of Multiprocessor Programming Theory

Victor Luchangco works in the Scalable Synchronization Group of Oracle Labs.
His research focuses on developing algorithms and mechanisms to support concurrent programming on large-scale distributed systems.

Moir's main research interests concern practical and theoretical aspects of concurrent, distributed, and real-time systems,
particularly hardware and software support for programming constructs
that facilitate scalable synchronization in shared memory multiprocessors.

On memory models.

People on Programming Languages (including Consistency Models and (Weak) Memory Models)

My research interests are in methods and tools for developing correct concurrent and distributed software.
* Sebastian Burckhardt
My research interests are programming models to program distributed, parallel, or concurrent or systems conveniently, efficiently, and correctly.
* Publications
* Marko Vukolić @ IBM Research
My research interests lie in the broad area of distributed systems.
Some more specific topics that I am (currently) interested in include fault-tolerance,
blockchain and distributed ledgers, cloud computing security and distributed storage.
* Jade Alglave
* Michael Emmi
Michael’s research enables the construction of reliable software by developing the foundations
for effective programming abstractions and informative program analysis tools.

People on the Theory of Distributed Systems

I am a software engineer at Google. Before that, I was a principal research scientist at Yahoo! Research. Before that I was an assistant professor at Georgia Tech, and before that I was a PhD student at Stanford.

Research Groups

Conferences, Journals, Workshops, and Magazines (By Topics)

SIGs:

General Theory of Computer Science

Conferences

The annual ALGO congress is the leading international gathering of researchers on Algorithms in Europe.

Journals

Distributed Computing Theory

Programming Languages and Concurrency Theory

Distributed Systems (and More General)

Formal Methods (Logic)

Database Systems

Databases

Journals

Workshops

January 20 – 25 , 2013, Dagstuhl Seminar 13042
- And: the report from Dagstuhl Seminar 13042
- And the slides from Dagstuhl Seminar 13042
- Lower Bounds for Distributed Computing (09w5114)
This workshop will bring together experts in the field (and some exceptional graduate students and postdocs) to discuss fundamental distributed computing problems whose computational complexities have not been resolved and the limitations of current techniques for obtaining lower bounds for these problems.
- FuDiCo: Future Directions in Distributed Computing 2004; FuDiCo @ DBLP'2003

Prizes and Awards

The European Association for Programming Languages and Systems has established a Best Dissertation Award
in the international research area of programming languages and systems.

Reports

Courses & Paper Reading Lists

This course introduces the principles of distributed computing, emphasizing the fundamental issues underlying the design of distributed systems and networks: communication, coordination, fault-tolerance, locality, parallelism, self-organization, symmetry breaking, synchronization, uncertainty. We explore essential algorithmic ideas and lower bound techniques, basically the "pearls" of distributed computing.

It will present abstractions and implementation techniques for engineering distributed systems. Topics include multithreading, remote procedure call, client/server designs, peer-to-peer designs, consistency, fault tolerance, and security, as well as several case studies of distributed systems.

The Lecture Notes (ppt) is elegant.
Topics will include the majority (we are going to shoot for all and see what happens) of the following: Global states and event ordering; Logical clocks; Vector clocks; Consistent cuts and global property detection; Rollback-recovery and message-logging protocols; State machine approach; Agreement protocols; Failure detectors; Replication and consistency; Byzantine fault tolerance; Atomic Commit

This course studies the organization of cloud computing systems and survey research problems in this area.

Big Ideas. Big Money. Big Data.

The primary emphasis is on operating systems and distributed systems. A secondary emphasis is on protocol implementation and next-generation network protocols. The focus when covering these topics is the extent to which they impact end-system design and implementation.

Lecture notes: Robust Concurrent Computing
It also provides a list of papers to read.

CPS 212 is a graduate-level course dealing with techniques for storing and sharing information in computer networks, large and small. We will cover a range of core distributed systems topics, with an emphasis on the issues faced by networked utility services, scalable Internet services, and enterprise storage systems.

This class will examine file system implementation, low-level database storage techniques, and distributed programming. Lectures will cover basic file system structures, journaling and logging, I/O system performance, RAID, the RPC abstraction, and numerous systems illustrating these concepts.

This course broadly examines distributed storage systems in its many manifestations. It explores how to harness and maintain the collective storage capabilities in storage systems from global-scale enterprises and cloud computing to peer-to-peer, ad hoc, and home networks.

Principles, techniques, and examples related to the design, implementation, and analysis of distributed and parallel computer systems.

Computer Science Ph.D. Thesis

Tools

Abstraction-based parameterized TLA+ checker: Bringing state-of-the-art model checking to TLA+

Blogs

English

Other Articles

Videos

From Leslie Lamport

Books

Synthesis Lectures on Distributed Computing Theory is edited by Jennifer Welch of Texas A&M University and Nancy Lynch of the Massachusetts Institute of Technology. The series publishes 50- to 150-page publications on topics pertaining to distributed computing theory. The scope largely follows the purview of premier information and computer science conferences, such as ACM PODC, DISC, SPAA, OPODIS, CONCUR, DialM-POMC, ICDCS, SODA, Sirocco, SSS, and related conferences. Potential topics include, but not are limited to: distributed algorithms and lower bounds, algorithm design methods, formal modeling and verification of distributed algorithms, and concurrent data structures.


Detail Pages:

Last modified 05 June 2025