Bite-Sized Serverless

General Icon

Design Principles of Event-Driven Architectures

General - Intermediate (200)
<!-- # Design Principles of Event-Driven Architectures -->
Event-driven architectures are reliable, performant, easy to maintain and extensible. But to unlock these benefits an application needs to be designed with events in mind. In this Bite we will cover the basic design principles for event-driven architectures.

Minimize responsibilities

As we discussed in the Bite Introduction to Event-Driven Architectures, components of an event-driven architecture can be producers, consumers or both. To achieve the highest reliability and performance these components should have as few responsibilities as possible. Let's look at an example.
In this architecture users can call three operations on an API Gateway. Each of these operations is backed by its own Lambda function. The PlaceOrder and CreateAccount functions emit a message when a user has placed a new order or has created an account. This message is put on an EventBridge message bus. EventBridge forwards the OrderPlaced event to the PostProcess OrderPlaced Lambda, and the AccountCreated event to the PostProcess AccountCreated Lambda. In this diagram every component has its own responsibilities: the functions behind API Gateway only process one type of request, and the functions behind EventBridge only process one type of event. Minimizing the responsibilities for each function is beneficial for a number of reasons, which we will cover below.

Isolated dependencies

Each of the Lambda functions only requires libraries and packages for its specific tasks. The PlaceOrder function might require a YAML library, but the CreateAccount function might not. By separating these functions, the CreateAccount function doesn't need to import the YAML library, which keeps it more lightweight. In case of AWS Lambda, lighter functions mean smaller deployment packages, which mean shorter cold starts and better user experience.
Isolated dependencies also allow different functions to use different versions of the same library. This is a great benefit compared to monolithic applications, where one component might require a library version 1.1.x and another component requires version 1.2.x. Because these components are running in the same environment, one of them will need to patched to match the requirements of the other. By reducing the responsibilities of each function as much as possible, this becomes a problem of the past.

Minimized fault domains

By isolating functions, a crash or bug has reduced impact on the overall system and is easier to localize. Take the Lambda functions behind the API Gateway: if this was a single function, a bug might bring the entire API to a grinding halt. But in our current architecture, a bug in the PlaceOrder function will only affect that part, while the rest of the application chugs happily along.

Isolated testing

Isolated functions are easier to test. Because the responsibilities of each component are limited, the amount of possible inputs, outputs and errors are reduced to a minimum as well.

Individual scaling and performance

When a single component in a monolith decides to eat all your CPU or memory, every other component is affected too. The application might add more resources to the application by scaling out, but that is quite inefficient; every component, including the ones not heavily loaded, will be duplicated on these new instances. They will also contend for the same CPUs and memory on the new instances, which might lead to additional contention. By using microservices and minimizing their responsibilities, each individual service and function can scale individually, without affecting other components.

Use services, not code

The best code is no code at all. It reduces the chance of errors, it doesn't have to be maintained, and you don't need to write unit tests for it. So if a cloud provider offers functionality as a managed service, it's generally a good idea to use it. Obvious examples are using SQS for queues, EventBridge for message buses, and API Gateway for REST APIs. But there are other, less obvious ways to minimize the amount of code in your infrastructure. For example, SQS offers native dead-letter queues (DLQs), which allows you to store and monitor undeliverable messages. Step Functions contain powerful integrated tools for error handling and retry logic. And API Gateway offers many first-class integrations which allow it to call other AWS services directly, without passing through a Lambda function. By using these AWS-managed features and services your application becomes more reliable, more performant and likely more cost-effective.

Respond to changes, don't process state

An event-driven architecture emits messages whenever a significant change occurs. Downstream systems should pick up these messages in near-realtime, for example by sending out an order confirmation as soon as the PlaceOrder event occurs. This approach differs from batch or state processing systems, which read from a data source periodically. The batch processing approach of order confirmations would be to read all orders of the day at midnight, and send out confirmations for all of them. An event-driven approach leads to a better user experience and lower peak loads. It can also reduce costs, because emitting and responding to an event is generally free or nearly free, while reading a large amount of data from a database like DynamoDB is not.

Use a standardized message format

To future-proof your event driven architectures you should choose a versatile message format early, and be prepared to stick with it. This allows the infrastructure to grow in directions unknown at the time of launch. An event can be considered a promise, especially in environments where the consumers are unknown to the producers: once you start sending out events, other systems start reading them and then expect the format to never change. By choosing a format that is standardized, versatile and extensible, like cloudevents, you avoid big, painful refactoring in the future.

Elasticity and buffers everywhere

A event-driven environment might start small, but can grow suddenly and in unexpected ways. For example, if your architecture emits events for NewUser and OrderPlaced, a large amount of events might suddenly occur as a result of a successful marketing campaign. The new user sign-ups might be handled by a Lambda function, which means they scale just fine. The events might be put onto an EventBridge message bus, which means it will be able to handle higher loads as well. But if EventBridge sends its sign-up messages to a single downstream EC2 instance, it might well be overwhelmed by the sudden onrush of events. Likewise, any relational databases might not be able to handle the increase amount of writes. To avoid any unexpected downtime in downstream components, assure that all components are able to scale with increased loads. If scaling is difficult to implement, make sure you add a buffer between the scaling and non-scaling systems, such as an SQS queue.

Conclusion

In this Bite we have discussed the design principles for event-driven architectures. By following these principles you can optimize the performance, maintainability and cost-efficiency of your infrastructure. It will also set up your environment for future success and reduce the risk of large-impact refactors down the road.