Bite-Sized Serverless

General Icon

Serverless Messaging: Latency Compared

General - Advanced (300)
Modern applications often use event messaging to decouple systems and services. AWS offers several serverless messaging systems, each with its own options and attributes. One attribute we do not talk about often is latency: the time the messaging service adds between a message producer and consumer. In this Bite we will compare the latency introduced by common messaging services: SQS, SNS, Step Functions, EventBridge, Kinesis, and DynamoDB Streams.

Latency test results

Without further ado, let's take a look at the test results.
ServiceP10 LatencyP50 LatencyP90 LatencyP99 LatencyIntegration Type
SQS Standard13.7 ms16.2 ms33.8 ms105 msPull - Lambda Event Source Mapping
SQS FIFO21.9 ms28.1 ms61.2 ms645 msPull - Lambda Event Source Mapping
SNS Standard62 ms73 ms125 ms225 msPush - Async Lambda invocation
SNS FIFO32.8 ms40.1 ms67.8 ms497 msPull - Additional SQS Queue and ESM
Step Functions Standard (async)115 ms142 ms204 ms313 msPush - Async Lambda invocation
Step Functions Standard (sync)73 ms99.1 ms133 ms250 msPush - Synchronous Lambda invocation
Step Functions Express (async)96.5 ms118 ms178 ms298 msPush - Async Lambda invocation
Step Functions Express (sync)57.5 ms77.3 ms129 m241 msPush - Synchronous Lambda invocation
EventBridge221 ms395 ms579 ms794 msPush - Async Lambda invocation
Kinesis Data Streams Standard267 ms627 ms965 ms1580 msPull - Lambda Event Source Mapping
Kinesis Data Streams with Enhanced Fan-Out28.8 ms49.4 ms67.5 ms109 msPush - Lambda Event Source Mapping
DynamoDB Streams118 ms213 ms341 ms552 msPull - Lambda Event Source Mapping
The tests were run on September 4th and 5th 2022, in the eu-west-1 (Ireland) region.

Testing methodology

Each of the services was tested by invoking a Producer Lambda Function 1.024 times. The Producer publishes a message with a timestamp in nanoseconds. The messages are read on the other side of the messaging service by a Consumer Lambda Function. The Consumer records the time the data was received, subtracts the sent timestamp, and reports the resulting duration. Effort was taken to measure the sending timestamp as late as possible, and the consuming timestamp as soon as possible. If the Producer or Consumer experienced a cold start the measurement was dropped from the results.
The source code for the tests can be found on GitHub or the zip file at the bottom of this page.

Percentiles

The table above covers the 10th, 50th, 90th, and 99th percentiles. In other words, the first column covers the fastest 10% of requests, the second column the fastest 50%, the third the fastest 90%, and the last column the fastest 99% of requests. P100 (the absolute slowest request) was not included to filter out irregularities in the network and Lambda service.
P10 (the fastest 10%) tells us how fast the service can be. For example, SQS Standard consumers can receive a message 13.7 milliseconds after it was sent. EventBridge consumers cannot receive a message earlier than 221 ms after it was sent. In other words: the service is as least as slow as its P10 value.
P50 (the fastest 50%) tells us how fast the service is for half of our users. For example, half of SNS Standard consumers receive their message within 73 ms.
P90 (the fastest 90%) tells us how fast the service is for most of our users. For example, 9 out of 10 SNS FIFO consumers received their message within 67.8 ms.
P99 (the fastest 99%) tells us how fast the service is for almost all of our users. For example, 99% of our Step Functions Express consumers received their message within 298 ms. Almost no one will have to wait longer than the P99 value to receive their message.

The importance of low latency

In this article we're focusing on the latency aspect of AWS messaging services. Latency is generally not the primary driver for an integration pattern decision. Instead attributes like history, replay, fan-out, and other requirements inform whether to use SNS, SQS, EventBridge, or another system. But in physical environments like factories and assembly plants latency does matter. And latency is especially important in direct user interaction: the experience of a 200 ms wait time is very different from a 30 ms wait.

Push vs. pull-based messaging

In our Lambda-to-Lambda setup a push-based messaging system will invoke our Consumer Lambda Function for us. EventBridge, for example, offers a push-based integration that will asynchronously invoke the Consumer Lambda Function. Step Functions are also push-based. They can be configured to invoke Lambda either synchronously or asynchronously, more about which below.
In a pull-based system like SQS or Kinesis, a message is placed onto a queue or stream, from where it can be retrieved by a consuming system. The pull-based tests above have been configured for performance. Specifically, the Lambda Event Source Mappings have been configured with a batch size of 1, which means they invoke the Consumer Lambda Function as soon as an item is found on the queue or stream. This is not necessarily the configuration you would use in a production system, but it is useful to measure how fast these queues and streams can be.

Synchronous vs. asynchronous Lambda invocations

Lambda Functions have three main invocation modes: Event Source Mappings for streams and queues, synchronous calls, and asynchronous calls.
Synchronous calls invoke the Lambda Function directly and wait for the function to return its response. This mode is also called request-response. Asynchronous calls, on the other hand, add an invisible, service-managed SQS Queue in front of the Lambda Function. When a function is invoked asynchronously, the event is placed on the queue and a success response is immediately returned. The actual Lambda Function is invoked at a later time, generally within a few dozen milliseconds. EventBridge and SNS Standard use asynchronous Lambda invocations. Step Functions can be configured to either call Lambda synchronously or asynchronously.

Service-specific observations

Simple Queue Service (SQS)

SQS is the fastest serverless integration system in the AWS ecosystem, if configured with small batches and long polling. SQS FIFO queues are slightly slower on all percentiles but can be significantly slower at P99 or above.

Simple Notification Service (SNS)

SNS Standard is the simplest push messaging system. It uses async Lambda Function invocations, which add a small additional latency to the call.
SNS FIFO is a different beast entirely. As stated in the documentation, an SNS FIFO Topic cannot invoke a Lambda Function directly. Instead you need to subscribe an SQS FIFO Queue to the SNS FIFO Topic and have the Lambda Function pull from the SQS Queue using an Event Source Mapping.
Fun fact: even though the FIFO mode has more moving parts, it can be faster than the Standard mode. This is likely due to a slower polling mechanism in asynchronous Lambda invocations than in Event Source Mappings.

Step Functions

Step Functions State Machines have two execution modes: Standard and Express. Standard workflows offer more guarantees than Express workflows, like exactly-once processing. The reduced consistency allows Express workflows to execute faster, as clearly shown in the latency results.
State Machines can be configured to invoke Lambda Functions either synchronously or asynchronously. Because asynchronous Lambda Functions invocations are placed on an intermediate queue, synchronous executions perform slightly better in our tests.

EventBridge

EventBridge always invokes Lambda Functions asynchronously. The service itself is one of the slower messaging systems, but AWS has publicly stated that latency improvements can be expected very soon.

Kinesis Data Streams

Kinesis Data Streams yield the most interesting results by far. In their standard configuration they are the slowest messaging system in our testing environment. But when we enable Enhanced Fan-Out, they suddenly move to the top of the list. From a 2018 AWS Blog Post we learn that Enhanced Fan-Out (EFO) uses HTTP/2 and the SubscribeToShard API to set up a bi-directional channel between the Kinesis Data Stream and the consumer, which pushes new messages directly to our Lambda Function. With EFO enabled latency improves over 10x, with the fastest messages arriving within 30 ms and the slowest messages arriving within 110 ms.

DynamoDB Streams

DynamoDB Streams are an interesting and very specific integration pattern. Like SQS and Kinesis, DynamoDB Streams are based on Event Source Mappings. If the ESM is configured with a very low batch size, the Consumer Lambda Function is generally invoked in a few hundred milliseconds.

Conclusion

When latency matters, there are a few obvious winners. SQS Standard can deliver a message to a consumer in as fast as 14 ms and is seldomly slower than 100 ms, assuming low batch sizes. Kinesis with Enhanced Fan-Out is only slightly slower and allows for multiple consumers and a long history of events. SNS falls in the low latency category too, although the SNS FIFO option includes more moving parts and thus a larger latency spread, up to half a second.
Step Functions and DynamoDB Streams take up the middle section, with P50 latencies up to about 200 ms.
The highest latency is introduced by EventBridge and Kinesis Data Streams without Enhanced Fan-Out. These services add at least a few hundred milliseconds to your integrations, but can easily run up to a second or more.

CDK Project

The services and code described in this Bite are available as a Python AWS Cloud Development Kit (CDK) Project. Within the project, execute a cdk synth to generate CloudFormation templates. Then deploy these templates to your AWS account with a cdk deploy. For your convenience, ready-to-use CloudFormation templates are also available in the cdk.out folder. For further instructions how to use the CDK, see Getting started with the AWS CDK.

Click the Download button below for a Zip file containing the project.