Reactive DDD: Modeling Uncertainty

Domain-Driven Design supports reactive architecture and programming. Still, reactive introduces uncertainty. Back in 2003, the way that Domain-Driven Design was used and implemented is quite different than the way we use it today. If you’ve read my book, “Implementing Domain-Driven Design”, you’re probably familiar with the fact that the Bounded Contexts I model in the book are separate processes; deployed separately as microservices. Whereas in Eric’s “Blue Book”, it seems like at times Bounded Contexts were separated logically, but deployed in the same web server or application server.

I’ll discuss a more contemporary use of Domain-Driven Design, and how it has changed to fit our modern needs.

This featured post was generated using Contenda, which helped Kalele create new content from a conference video recording on YouTube that we already had. The Contenda artificial intelligence thoroughly assisted in this effort, requiring minimal additional text clean up.

Vaughn Vernon interviewed Cassidy Williams (@cassidoo), Contenda's CTO, on a recent episode of the Add Dot podcast.

The essence of Domain-Driven Design is modeling a Ubiquitous Language in a Bounded Context. But what is a Bounded Context? A Bounded Context is a clear delineation between one model and another model. It’s a boundary that makes it possible for the model inside to be defined very explicitly and with clear meaning. The team, including domain experts, discover and use the Ubiquitous Language by working together. In one Bounded Context, a Product has a specific meaning and behavior. In a different Bounded Context, a Product could have similarities to the first Product. It may even share identities for whatever reason, across Bounded Contexts. Yet, generally speaking, the product in another context has at least a slightly different meaning and possibly even a vastly different meaning. Domain-Driven Design is focused on the Bounded Context containing a Ubiquitous Language.

There is one very definite definition of Product in one team situation with one Ubiquitous language. This means that there are other models, other teams developing other models, or perhaps even the same team that’s responsible for this model could be responsible for other models. This creates a situation where there are now multiple Bounded Contexts, because when using DDD practicioners avoid defining a single model concept with a broad meaning in a single enterprise or a single system.

Context Mapping is used to collaborate between teams and integrate any number of Bounded Contexts. Context Mapping uses lines between any two Bounded Contexts to represent a between two teams and likely a translation between two Ubiquitous Languages. Consider the following Bounded Contexts in the Context Map. Assume that the context at the top “speaks a language” and the one to the lower-right “speaks a different language.” What is needed between these two languages for one side can understand the other? A translation is needed.

Tactical modeling with Domain-Driven Design is where we model a particular ubiquitous language in a very careful, even fine grained, way. If you think of strategic modeling or strategic design as being sort of the broad brushstrokes with a large brush, then think of tactical design as using the fine brush to get the real details painted in.

When it comes to Domain-Driven Design, it’s important to remember that you need help from domain experts. This is true when you’re modeling strategically and tactically. Domain experts can help you avoid creating an anemic domain model and to obtain a better understanding of the data.

The Benefits of Using Strong Types

If you’re working with Business Identifiers in Java, you may be tempted to use raw UUIDs. However, by doing so, it will limit compatibility with programming languages that don’t support UUID at all, or that don’t support the same UUID types. Instead, consider using a strongly typed id. This will ensure that your IDs can be easily translated across languages. A UUID can still be generated for the local id, but consider strigifying it either at creation time or when it is placed inside an event that will be published outside the context.

Another tip is to model money as a Money Value Object instead of using a raw BigDecimal. This will help to standardize rounding and scaling rules across your application. It will also centralize Money behaviors in one place, on the Money type.

When considering the development of microservices, it’s important to keep in mind the level of uncertainty that can occur at all stages. Be ready to deal with uncertainty by modeling it explicitly. This will help in designing more robust microservices that are much simpler to reason about than those that try to create certainty in a highly indeterministic environment.

Donald Knuth is often quoted as saying “premature optimization is the root of all evil.” What he actually said was “We should forget about small inefficiencies, premature optimization is the root of all evil.”

Donald Knuth also said that “people who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird.”

I agree with both of these statements. We should not worry about small inefficiencies, and we should try to understand the underlying hardware when we write programs.

In 1973, processors were very slow, with few transistors and clock speeds below 1 MHz. However, according to Moore’s law, the speed of processors would be doubling and the number of transistors would be increasing every few years. This proved true for a good number of years. Still, in 2003, the trend of increasing clock speeds declined rapidly, even though the number of transistors continued to increase. Today, cores are common rather than exponetially faster processors, with most computers having multiple cores. This trend began in response to the slowdown of processor speeds.

Applying reactive is a great example of how to make full use of all cores. In the past, we would have had one thread handling all the events, but now with multiple cores, we can have multiple threads handling different events. This makes our components more responsive and efficient.

The Actor Model

I’m really interested in the actor model and the reactive side of things. The goal discussed herein is to produce a microservice that is reactive. This microservice would be part of a larger system solution that is also reactive. Events would be published by the microservice and other microservices in the system would consume these events. This would all happen asynchronously.

The following figure illustrates ideal messaging operations, but ideal is seldom attainable. In distributed computing, it is impossible to guarantee that messages will be delivered to every consumer exactly once and in exactly their orginal order. This is because there will always be one occurrence somewhere in time that this won’t hold true. Therefore, it is important to model uncertainty in your system in order to avoid the pitfalls of letting things happen as they may.

Uncertainty is a state that we can deal with. We have ways of reasoning about the state of a system, whether the state of a system is consistent or inconsistent at any point in time. And ultimately, we want to get to a point where we consider that state to be certain and finished. Uncertainty is uncomfortable, but it’s a situation that we can handle if we have the right tools.

The Challenges of Distributed Computing

In distributed computing, it’s often difficult to see the full picture. We have to learn to squint to see all the tiny little details that make up the whole. This can be difficult because of our experience and affinity to certainty, blocking, and synchronization. When you study enough architectures, it’s clear there’s a penchant for trying to create certainty in uncertain environments.

First consider a message Deduplicator. It’s not too difficult to implement this, but we need to think about what is actually involved. There’s a data source in which received events are persisted. Caching some delivered events in memory is possible, but likely not a week’s worth, for example. It’s not Google’s system, but it’s probably still not possible to cache many events longish term.

Of course a database table must be created for the arriving events to be persisted, and one or more of its properties must be available to uniquely identify each event. When each event arrives, the database is queried to ask the database, “has this event been handled before?” If the answer is “no,” the event can be applied locally. It has seen it before? Ok, ignore it. But can it simply be ignored? Why was it seen before? Is it because a delivery acknowledge failed or happened too slowly? Can it be accepted again? Should it now be acknowledged? How long must event ids remain in the database to indicate that the event has been seen? Is a week enough? How about a month? Shall we incur the cost of deleting it after the event is handled? What if the delete fails? Or again, what if the delete succeeds but the event is redelivered due to an acknowledgement timeout or some other issue on the message broker/bus? Talk about uncertainty. Right? Trying to solve these kinds of problems by throwing technology at it can be very difficult and error prone.

Next consider a message Resequencer. Consider the scenario where Event 3, Event 1, and Event 4 arrive, and in that order. For whatever reason—possibly similar to the reason the other three events arrived out of order—Event 2 doesn’t arrive for a long time.

We can find a way to reorder them, but we must assume that there is something inherent with the event that indicates its sequence. (Causality is possible, but it’s hard to determine outside the service that caused it.) Event 1 becomes the first event in the sequence and it’s determined that Event 3 and Event 4 follow Event 1, and in that order. Yet, a decision is made that without Event 2, Event 3 and Event 4 must not be processed. Perhaps it’s even unadvisable to allow Event 1 to be processed until Event 2 (possibly) arrives. Since Even 2 is latent, what happens to the system in the meantime? It has effectively shut down some system processing, and it appears that more than one fact is latent—at least three, and possibly four The processing is delayed for some indeterminant time because Event 2 hasn’t shown up. Finally, Event 2 arrives, and now all the delayed event stimuli can be processed by the application. That’s quite a weak design.

All of these approaches are hard because there’s uncertainty. And the biggest problem is that in trying to create certainty in a uncertain environment, no business problems have been solved. To boot, the technology problems being solved will likely prove to be brittle.

Objects vs the Space Between

By acknowledging that distributed computing is now part of the business model, these problems can be much more readily solved. Distributed systems are here to stay and modeling uncertainty matters becausethe cloud, microservices, and latencyare all factors to consider. Domain-Driven Design is a good approach to microservices. Effective design is designing software well.

Designing a great product requires meeting the needs of the business. This is where DDD helps. DDD is to codify business needs in your model. It helps to focus on the messages between objects, rather than only the objects’ internals.

 Alan Kay, the person who coined the term “object-oriented programming,” recommends paying attention to the space between objects, which are the messages sent from one object to another. Although contemporary programming languages that claim to be object-oriented don’t naturally support sending messages between objects, a method call or invocation represents roughly the same thing. Thus, the names of objects and the names of messages are an important focus of object-oriented programming, and applies well to DDD’s Ubiquitous Language in a Bounded Context.

Reactive systems can be designed to deal with uncertainty, and this works well applying by modeling them at the heart of the software. By solving problems related to uncertainty in the domain model, the software product can achieve business solutions while gracefully tackling uncertainty’s complex nature.

Modeling Uncertainty

Pat Helland, who used to work at Amazon, published a paper called “Life Beyond Distributed Transactions: An Apostate’s Viewpoint.” In it, he argues that global transactions cannot scale to the level that Amazon needs. He suggests that, in a system without distributed transactions, uncertainty must be managed in the business logic.

Track Partner Activity

Helland introduces the idea of partner entities, which collaborate to solve problems in separate services (Bounded Contexts could be applied here). Partner entities across services that are each managed by different transactions always introduce uncertainty. Helland  describes tracking partner activity as a means to manage uncertainty “in the business logic.” Doing so requires each partner to track the activities that it has seen from its partner. This can be represented as a hash set or something similar.

In a long-lived entity, it is important to consistently track all activity of every partner. As seen in the above code, this can be done by recording the activity and then looking back to see if the partner activity occurred. It’s similar to the Deduplicator, but it’s part of the domain model rather than an infrastructural concern.

Explicitly Modeled Uncertainty

Consider a SaaS product business, where clients are matched with skilled workers to get a household job done. It’s a peer-to-peer, e-commerce business model.

Here’s how it works. A client submits a job proposal with task expectations and and offered price. The proposal is then processed for fair pricing, recommended skilled workers, and worker availability per their scheduling calendar. If all of the processing identifies one or more candidate workers for notification, and at least one such worker accepts the proposal, the job will eventually be fulfilled. Later, when the job is completed to satisfaction, the client pays the worker through the service.

If earlier in the vetting process the pricing was considered unfair and rejected, for example, the client is provided a suggested price. The client may then resubmit the proposal, possibly with adjusted expectations, or pricing, or both. Upon proposal resubmission, the vetting process starts again, and ideally leads to a match and job fulfillment.

In the Matching Bounded Context, a proposal model tracks its progress through the vetting process. Per Helland’s paper, this is where the partner activities are tracked. There is no specific and explicit “activity” model here. Rather, the proposal’s progress and other state data delivered by partner events, comprise each “activity.” If the proposal itself uses Event Sourcing, the proposal’s event stream has a discreet, detailed factual record of each incoming partner activity: PricingAccepted (or PricingRejected), WorkersRecommended, and AvailabilityLocated.

Another Bounded Context translating the ProposalSubmitted event into something that the Pricing and Scheduling Bounded Contexts can understand. That’s the sample model. Another important aspect of DDD is identifying what is our Core Domain.

You might think that Matching is a Core Domain because this is where we make all our money. Or is it? Matching is where a client and a worker get matched up to have a job completed. Even so, Matching is a fairly simple web app. The industry knows a lot about implementing web apps, and the apps in themselves are not that challenging.

What’s really interesting is that the business behind the P2P economy is going to actually make its money through pricing. A large part of the startup’s success or failure depends on accurately validating pricing. It’s win or lose. For example, is the proposed job to be done (a) during normal weekday business hours, (b)  weekdays but outside business hours, (c) on the weekend, or (d) over a holiday such as party catering or an emergency plumbing situation? If it’s anything outside regular business hours, the Pricing Context must bump cost to reflect the odd working hours. And the bump will increase from b to d, respectively.

In the overall processing, the system will evaluate the clients and the workers for likely preferences and limits, such as client demographics and worker willingness to work outside normal hours. So there’s a lot of important things going on simultaneously. And Pricing is at least one of the Core Domains in this P2P economy.

As events are delivered to any Bounded Context, such as Matching, Pricing, and Scheduling, they will be translated to commands. This happens at the edge, in the infrastructure. In the case of Pricing, the ProposalSubmitted event is received from Matching and it is translated to the VerifyPricing command. If pricing is verified or rejected, the PricingVerified or Pricing Rejected events is emitted respectively from Pricing.

Many events are received, some of which may be out of sequence. This can happen, as stated previously, if Event 1 and Event 3 are received, but Event 2 doesn’t arrive for a while. With a model handling this uncertainty,  events are never stalled for the sake of sequencing because it doesn’t matter the sequence in which they arrive. All received events are immediately translated to commands and passed through to the domain model.

As the model receives each command, it can cause the activity tracking to take note of it. (In the cause of the Matching proposal, its progress model will be advanced by one step for each command.) It could be that the model (e.g. proposal) can take only certain actions until all commands caused by required events are received, such as after Event 2 arrives. Even so, there are no elaborate schemes for dealing with duplicate and out-of-sequence events. (Easy peasy, and explicit!)

Sometimes a process can become more complex and the previously described choreography lacks some robustness the manage it. There are ways of dealing with this using orchestration through what’s called a Process Manager or Saga.

Multiple partner entities on the left-hand side are emitting events, which are making their way to the Process Manager. The process manager is a state machine that says in response, “if I haven’t seen this event yet, I’m going to issue a command to some other partner entity to say ‘do this.'” The Process Manager state machine is a reactive object. Some processes are never intended to complete, but continue processing. In such a case, events and commands just keep cycling through it.

For the example Matching Context, its process is considered completed when a client and worker are finally matched. At that time the Matching Saga emits a command that instructs the Fulfillment Context and process to start. At that point the Matching Saga is done with that specific client-worker process.

Using orchestration through a Process Manager / Saga makes reasoning about a business process more intuitive. The process’ current state, whether sourced or serialized, can indicate if something has gone wrong and where the failure occurred. If some event never seems to show up, a timeout can be used to indicate a maximum threshold for completion, and the business process will suspend with an uncompleted status.

It’s not the software developers on the team who will say, “let’s define the timeout to be two minutes.” You want the business to make decisions like that. Timeouts are part of your modeling heuristics, and including the business domain experts in the decision is the DDD way.

At the end of the day, modeling uncertainty is focused on pushing as many business rules and constraints as possible into the domain model. If software is making pseudo business decisions anywhere outside the domain model, such as deduplicating and resequencing events, the team has already lost its way. Reactive DDD can be made simple by reconsidering past technical solutions and migrating to business-motivated solutions.

More to explore

Reactive DDD: Modeling Uncertainty

Domain-Driven Design supports reactive architecture and programming. Still, reactive introduces uncertainty. Back in 2003, the way that Domain-Driven Design was used and

Scroll to Top