Publish/subscribe (Pub-sub)

The publish/subscribe interaction paradigm provides subscribers with the ability to express their interest in an event or a pattern of events, in order to be notified subsequently of any event, generated by a publisher, that matches their registered interest. In other terms, producers publish information on a software bus (an event manager) and consumers subscribe to the information they want to receive from that bus. This information is typically denoted by the term event and the act of delivering it by the term notification.

The basic system model for publish/subscribe interaction relies on an event notification service providing storage and management for subscriptions and efficient delivery of events. Such an event service represents a neutral mediator between publishers, acting as producers of events, and subscribers, acting as consumers of events. Subscribers register their interest in events by typically calling a subscribe() operation on the event service, without knowing the effective sources of these events. This subscription information remains stored in the event service and is not forwarded to publishers. The symmetric operation unsubscribe() terminates a subscription.

To generate an event, a publisher typically calls a publish() operation. The event service propagates the event to all relevant subscribers; it can thus be viewed as a proxy for the subscribers. Note that every subscriber will be notified of every event conforming to its interest (obviously, failures might prevent subscribers from receiving some events). Publishers also often have the ability to advertise the nature of their future events through an advertise() operation. The provided information can be useful for (1) the event service to adjust itself to the expected flows of events, and (2) the subscribers to learn when a new type of information becomes available. The decoupling that the event service provides between publishers and subscribers can be decomposed along the following three dimensions:

Space decoupling: The interacting parties do not need to know each other. The publishers publish events through an event service and the subscribers get these events indirectly through the event service. The publishers do not usually hold references to the subscribers, neither do they know how many of these subscribers are participating in the interaction. Similarly, subscribers do not usually hold references to the publishers, neither do they know how many of these publishers are participating in the interaction.
Time decoupling: The interacting parties do not need to be actively participating in the interaction at the same time. In particular, the publisher might publish some events while the subscriber is disconnected, and conversely, the subscriber might get notified about the occurrence of some event while the original publisher of the event is disconnected.
Synchronization decoupling: Publishers are not blocked while producing events, and subscribers can get asynchronously notified (through a callback) of the occurrence of an event while performing some concurrent activity. The production and consumption of events do not happen in the main flow of control of the publishers and subscribers, and do not therefore happen in a synchronous manner.

Topic-based publish/subscribe. The earliest publish/subscribe scheme was based on the notion of topics or subjects, and has been implemented by many industrial strength solutions (e.g., Altherr et al. [1999]; Talarian Corporation [1999]; Skeen [1998]; TIBCO [1999]). It extends the notion of channels, used to bundle communicating peers, with methods to characterize and classify event content. Participants can publish events and subscribe to individual topics, which are identified by keywords. Topics are strongly similar to the notion of groups, as defined in the context of group communication [Powell 1996] and often used for replication [Birman 1993]. This similarity is not surprising, since some of the first systems to offer publish/subscribe interaction were based on the Isis [Birman et al. 1990] group communication toolkit and the subscription scheme was thus inherently based on groups. Consequently, subscribing to a topic T can be viewed as becoming a member of a group T, and publishing an event on topic T translates accordingly into broadcasting that event among the members of T. Although groups and topics are similar abstractions, they are generally associated with different application domains: groups are used for maintaining strong consistency between the replicas of a critical component in a local area network (LAN), whereas topics are used to model large-scale distributed interactions. In practice, topic-based publish/subscribe systems introduce a programming abstraction which maps individual topics to distinct communication channels. They present interfaces similar to those of the event service discussed in Section 2, and the topic name is usually specified as an initialization argument. Every topic is viewed as an event service of its own, identified by a unique name, with an interface offering publish() and subscribe() operations.

The topic abstraction is easy to understand, and enforces platform interoperability by relying only on strings as keys to divide the event space. Additions to the topic-based scheme have been proposed by various systems. The most useful improvement is the use of hierarchies to orchestrate topics. While group-based systems offer flat addressing, where groups represent disconnected event spaces, nearly all modern topic-based engines offer a form of hierarchical addressing, which permits programmers to organize topics according to containment relationships. A subscription made to some node in the hierarchy implicitly involves subscriptions to all the subtopics of that node. Topic names are generally represented with a URLlike notation and introduce a hierarchy very similar to the USENET news. Most systems allow topic names to contain wildcards, first introduced in TIBCO Rendezvous [TIBCO 1999], which offer the possibility to subscribe and publish to several topics whose names match a given set of keywords, like an entire subtree or a specific level in the hierarchy.

Consider the example of stock quotes disseminated to a large number of interested brokers. In a first step, we are interested in buying stocks, advertised by stock quote events. Such events consist of five attributes: a global identifier, the name of the company, the price, the amount of stocks, and the identifier of the selling trader. Figure 11 shows how to subscribe to all stock quotes, and Figure 12 gives an overview of the resulting distributed interaction.

Content-based publish/subscribe. Despite improvements like hierarchical addressing facilities and wildcards, the topic-based publish/subscribe variant represents a static scheme which offers only limited expressiveness. The content-based (or property-based [Rosenblum and Wolf 1997]) publish/subscribe variant improves on topics by introducing a subscription scheme based on the actual content of the considered events. In other terms, events are not classified according to some predefined external criterion (e.g., topic name), but according to the properties of the events themselves. Such properties can be internal attributes of data structures carrying events, as in Gryphon [Banavar et al. 1999a], Siena [Carzaniga et al. 2000], Elvin [Segall et al. 2000], and Jedi [Cugola et al. 2001], or meta-data associated to events, as in the Java Message Service [Hapner et al. 2002].

Consumers subscribe to selective events by specifying filters using a subscription language. The filters define constraints, usually in the form of name-value pairs of properties and basic comparison operators (=, <, ≤, >, ≥), which identify valid events. Constraints can be logically combined (and, or, etc.) to form complex subscription patterns. Some systems, like the Cambridge Event Architecture (CEA) [Bacon et al. 2000], also provide for event correlation: participants can subscribe to logical combinations of elementary events and are only notified upon occurrence of the composite events. Subscription patterns are used to identify the events of interest for a given subscriber and propagate events accordingly. For subscribing, a variant of the subscribe() operation is provided by the event service, with an additional argument rep resenting a subscription pattern. There are several means of representing such patterns:

— String: Subscription patterns are most frequently expressed using strings. Filters must conform to a subscription grammar, such as SQL [Hapner et al. 2002; Oracle 2002; Lewis 1999], OMG’s Default Filter Constraint Language [OMG 2002b], XPath [Altinel and Franklin 2000; Chan et al. 2002a; Diao et al. 2002], or some proprietary language [Banavar et al. 1999a; Carzaniga et al. 2001; Segall and Arnold 1997]. Strings are then parsed by the engine.

— Template object: Inspired by tuple-based matching, JavaSpaces [Freeman et al. 1999] adopts an approach based on template objects. When subscribing, a participant provides an object t, which indicates that the participant is interested in every event that conforms to the type of t and whose attributes all match the corresponding attributes of t, except for the ones carrying a wildcard (null).

— Executable code: Subscribers provide a predicate object able to filter events at runtime. The implementation of that object is usually left to the application developer. An alternative approach, based on a library of filter objects implemented using reflection, was described in Eugster and Guerraoui [2001]. Executable code is not widely used in practice because the resulting filters are extremely hard to optimize.

An example of string-based filters outlines how a content-based scheme enforces a finer granularity than a static scheme based on topics. To achieve the same functionality with topics, the subscriber would either have to filter out irrelevant events, or topics would need to be split into several subtopics—one for each company (and recursively several subtopics for different price “categories”). The first approach leads to an inefficient use of bandwidth, while the second approach results in a high number of topics and an increased risk of redundant events.

Type-based publish/subscribe. Topics usually regroup events that present commonalities not only in content, but also in structure. This observation has led to the idea of replacing the name-based topic classification model by a scheme that filters events according to their type [Eugster et al. 2001]. In other terms, the notion of event kind is directly matched with that of event type. This enables a closer integration of the language and the middleware. Moreover, type safety can be ensured at compile-time by parameterizing the resulting abstraction interface by the type of the corresponding events (without any type cast in the resulting code). In contrast, the aforementioned templatebased approach of JavaSpaces [Freeman et al. 1999] considers the type of events as a dynamic property, and the resulting JavaSpace API forces the application to perform explicit type casts. Similarly, the TAO CORBA Event Service [Harrison et al. 1997] does not view the type of an event object as an implicit attribute. The example in Figure 15 illustrates type-based subscription. Stock events can be split into two distinct types: stock quotes (for sale) and stock requests, as shown in Figure 16. Brokers use stock requests to express their interest in buying stock. In contrast to quotes, requests have a range of possible prices. Subtyping can be used to subscribe to both stock quotes and requests.

It is important to notice that typebased publish/subscribe can lead to a natural description of content-based filtering through public members of the considered event type, while ensuring the encapsulation of these events. This is achieved in our example of Figure 15 by declaring only private data members and enforcing their access through public methods.

The Incarnations: Implementation issues

Events. Events are found in two forms: messages or invocations. In the first case, events are delivered to a subscriber through a single generic operation (e.g., notify()), while in the second case events trigger the execution of specific operations of the subscriber.

Messages. At the lowest level, any data that goes on the network is a message. In most systems, event notifications take the form of messages, which are explicitly created by the application. Messages are generally made of a header that contains message-specific information in a generic format, and payload data that contains user-specific information. Typical header fields include message identifier, issuer, priority, or expiration time, which can be interpreted by the system or purely serve as information for the consumers. Some systems (e.g., IBM MQSeries [Lewis 1999] and Oracle Advanced Queuing [Oracle 2002]) do not make any assumption on the type of the payload data and treat it as an opaque array of bytes. Some other systems (e.g., JMS [Hapner et al. 2002], CORBA Notification Service [OMG 2002b]) provide a set of message types, such as text or XML messages. Finally, some systems provide self-describing messages. TIBCO Rendezvous [TIBCO 1999], for instance, defines a message format that does not have header information, but allows the programmer to create his or her own message structure based on a set of basic types that can be structured hierarchically. The type of messages can be queried later at runtime. Distributed Asynchronous Collections (DAC) [Eugster et al. 2000] and Java Message Service (JMS) [Hapner et al. 2002] even support object messages, where the event can be any serializable Java object. In most cases, messages are viewed as records with several fields.

Invocations. At a higher level, we generally differentiate between invocations and messages. An invocation is directed to a specific type of object, and has well-defined semantics. The system ensures that all consumers have a matching interface for processing the invocation. The interface acts as a binding contract between the invoker and the invokees. Systems which offer invocation-style interaction along with different semantics and various addressing schemes are usually termed messaging systems. They incorporate additional logic on top of a publish/subscribe or message queuing system to transform low-level messages into invocations to methods of the subscribers, which must all be of the same type. While certain systems take into account return values of invocations, the typed publish/subscribe models of COM+ [Sessions 1997] or the CORBA Event Service [OMG 2001] typically only consider one-way invocations. Producers invoke operations on some intermediary object (e.g., event channel) that exhibits the same interface as the actual consumers and forwards events to all registered consumers. COM+ furthermore provides a form of contentbased filtering, by offering the possibility to specify values for invocation arguments in order to restrict the potential invocations.

The Media. The transmission of data between producers and consumers is the task of the
middleware medium. Media can be classified according to characteristics like their
architecture or the guarantees they provide for the data, such as persistence or
reliability.

Architectures. The role of publish/subscribe systems is to permit the exchange of events between producers and consumers in an asynchronous manner. Asynchrony can be implemented by having producers send messages to a specific entity that stores them, and forwards them to consumers on demand. We call this approach a centralized architecture because of the central entity that stores and forwards messages. This approach is adopted by queuing systems like the IBM MQSeries [Lewis 1999] and Oracle Advanced Queuing [Oracle 2002], each of which is built on top of a centralized database. Applications based on such systems have strong requirements in terms of reliability, data consistency, or transactional support, but do not need a high data throughput. Examples of such applications are electronic commerce or banking applications.

Asynchrony can also be implemented by using smart communication primitives that implement store and forward mechanisms both in the producer’s and consumer’s processes, so that communication appears asynchronous and anonymous to the application without the need for an intermediary entity. We call this approach a distributed architecture because there is no central entity in the system. TIBCO Rendezvous [TIBCO 1999] uses a decentralized approach in which no process acts as a bottleneck or a single point of failure. Such architectures are well suited for fast and efficient delivery of transient data, which is required for applications like stock exchange or multimedia broadcasting.

An intermediate approach, adopted for instance by Gryphon [Banavar et al. 1999a], Siena [Carzaniga et al. 2000], and Jedi [Cugola et al. 2001], consists in
implementing the event notification service as a distributed network of servers. In contrast to completely decentralized systems, this approach discharges the participating processes by using dedicated servers to execute the complex protocols required for persistence, reliability, or high-availability, as well as for contentbased filtering and routing. There are different topologies for these servers. Jedi’s event dispatchers are organized in a hierarchical structure, where clients can connect to any node. Subscriptions are propagated upward the tree of servers. Such hierarchical topologies tend, however, to heavily load the root servers, and the failure of a server might disconnect the entire subtree. In Gryphon, a graph summarizing the common interests of subscribers is superimposed with the message broker graph, to avoid redundant matches. Siena uses subscription and advertisement forwarding to set the paths for notifications. Event servers keep track of useful information to efficiently match events with subscriptions. Several server topologies have been considered, each with respective advantages and shortcomings.

Dissemination. The actual transmission of data can happen in various ways. In particular, data can be sent using point-to-point communication primitives, or using hardware multicast facilities like IP multicast [Deering n.d.]. The choice of the communication mechanism depends on factors such as the target environment and the architecture of the system.

Centralized approaches like certain message queuing systems are likely to use point-to-point communication primitives between producers/consumers and the centralized broker. As already mentioned, these systems focus more on strong guarantees than on high throughput and scalability. Topic-based publish/subscribe systems can straightforwardly benefit from the vast amount of studies on group communication [Powell 1996] and the resulting protocols to disseminate events to subscribers. To ensure high throughput, Internet protocol (IP) multicast or a wide range of reliable multicast protocols [Floyd et al. 1997; Holbrook et al. 1995; Lin and Paul 1996; Castro et al. 2002; Banerjee et al. 2002; Ratnasamy et al. 2001; Zhuang et al. 2001] are commonly employed.

Efficient multicast of events in contentbased publish/subscribe systems remains an issue. Gryphon and Siena both use algorithms [Aguilera et al. 1999; Carzaniga et al. 2001] that deliver events to a logical network of servers in such a way that an event is propagated only to the servers that manage subscribers interested by that event. The performance of such dissemination-based systems is strongly affected by the cost of event filtering on each of the servers, which directly depends on the number of subscriptions in the system. Highly efficient and scalable algorithms have been recently proposed for filtering data in publish/subscribe systems [Altinel and Franklin 2000; Pereira et al. 2000; Fabret et al. 2001; Campailla et al. 2001; Chan et al. 2002a; Diao et al. 2002]. The problem of aggregating subscriptions to increase the filtering speed at each server, at the price of a small loss in precision, has been studied in Chan et al. [2002a]. Irrespective of the filtering techniques, the selective event routing inherent to content-based publish/subscribe makes the exploitation of network-level multicast primitives difficult.

Qualities of Service. The guarantees provided by the medium for every message vary strongly between the different systems. Among the most common qualities of service considered in publish/subscribe, we have persistence, transactional guarantees, and priorities.

Persistence. In RPC-like systems, a method invocation is by definition a transient event. The lifetime of a remote invocation is short and, if the invokee does not get a reply after a given period of time, it may reissue the request. The situation is different in publish/subscribe or queuing systems. Messages may be sent without generating replies, and they may be processed hours after having been sent. The communicating parties do not control how messages are transmitted and when they are processed. Thus, the messaging system must provide guarantees not only in terms of reliability, but also in terms of durability of the information. It is not sufficient to know that a message has reached the messaging system that sits between the producers and consumers; we must get the guarantee that the message will not be lost upon failure of that messaging system.

Persistence is generally present in publish/subscribe systems that have a centralized architecture and store messages until consumers are able to process them. Queuing systems like the IBM MQSeries [Lewis 1999] and Oracle Advanced Queuing [Oracle 2002] offer persistence using an underlying database. Distributed publish/subscribe systems do not generally offer persistence since messages are directly sent by the producer to all subscribers. Unless the producer keeps a copy of each message, a faulty subscriber may not be able to get missed messages when recovering. TIBCO Rendezvous [TIBCO 1999] offers a mixed approach, in which a process may listen to specific subjects, store messages on persistent storage, and resend missed messages to recovering subscribers. The Cambridge Event Architecture [Bacon et al. 2000] provides a potentially distributed event repository for event storage and efficient retrieval (with searching facilities for simple and composite events) that enables the replaying of stored sequences of events.

Priorities. Like persistence, message prioritization is a quality of service offered by some messaging systems. Indeed, it may be desirable to sort the messages waiting to be processed by a consumer in order of priority. For instance, a realtime event may require immediate reaction (e.g., failure notification) and should be processed before other messages.

Priorities affect messages that are in transit, that is, not being processed. Runtime execution priorities are handled by the application scheduler and are not managed by the messaging system. In particular, this implies that two subscribers listening to the same topics may process messages in different orders because they process messages at different speeds, even though communication channels are first in, first out (FIFO). Priorities should be considered as a best-effort quality of service (unlike persistence).

Most publish/subscribe messaging systems (centralized or distributed) provide priorities, although the number of priorities and the way they are applied differ. IBM MQSeries [Lewis 1999], Oracle Advanced Queuing [Oracle 2002], TIBCO Rendezvous [TIBCO 1999], and the JMS specification [Hapner et al. 2002] all support priorities.

Transactions. Transactions are generally used to group multiple operations in atomic blocks that are either completely executed or not executed at all. In messaging systems, transactions are used to group messages into atomic units: either a complete sequence of messages is sent (received), or none of them is. For instance, a producer that publishes several semantically related messages may not want consumers to see a partial (inconsistent) sequence of messages if it fails during emission. Similarly, a missioncritical application may want to consume one or several messages, process them, and only then commit the transaction. If the consumer fails before committing, all messages are still available for reprocessing after recovery.

Due to their tight integration with databases, IBM MQSeries [Lewis 1999] and Oracle Advanced Queuing [Oracle 2002] provide a wide range of transactional mechanisms. JMS [Hapner et al. 2002] and TIBCO Rendezvous [TIBCO 1999] also provide transaction support for grouping messages in the context of a single session. JavaSpaces [Freeman et al. 1999] provides lightweight transactional mechanisms to guarantee atomicity of event production and consumption. An event published in a JavaSpace in the context of a transaction is not visible outside the transaction until it is committed. Similarly, a consumed event is not removed from a JavaSpace until the enclosing transaction commits. Several events can be produced and consumed in the context of the same transaction.

Reliability. Reliability is an important feature of distributed information systems. It is often necessary to have strong guarantees about the reliable delivery of information to one or several distributed entities. Because of the loose synchronization between producers and consumers of information, implementing reliable event propagation (“guaranteed delivery”) is challenging.

Centralized publish/subscribe systems generally use reliable point-to-point channels to communicate with publishers and subscribers, and keep copies of events on stable storage. Events are therefore reliably delivered to all subscribers, although a failure of the centralized event broker may delay delivery.

Systems based on an overlay network of distributed event brokers often use reliable protocols to propagate events to all or a subset of the brokers. Protocols based on group communication [Powell 1996] and reliable application-layer multicast [Floyd et al. 1997; Holbrook et al. 1995; Lin and Paul 1996; Castro et al. 2002; Banerjee et al. 2002; Ratnasamy et al. 2001; Zhuang et al. 2001] are good candidates as they are resilient to the failure of some of the brokers. Individual publishers and subscribers generally communicate with the nearer broker using point-to-point communication channels.

Finally, systems that let publishers and subscriber communicate directly with each other, such as TIBCO Rendezvous [TIBCO 1999], also use lightweight reliable multicast protocols. As events are generally not kept in the system for failed or disconnected (time-decoupled) subscribers, guaranteed delivery must be implemented by deploying dedicated processes that store events and replay them to requesting subscribers.

Tags: distribution concept publish-subscribe

Last modified 30 July 2025

Publish/subscribe (Pub-sub)

Notes and reading on the subject.

The Incarnations: Implementation issues