Bite-Sized Serverless

SQS Icon

The 9 Ways an SQS Message can be Deleted

SQS - Intermediate (200)
Amazon Simple Queue Service (SQS) is the oldest AWS service. SQS was released in beta in 2004, and although the core functionality has remained the same, many features and integrations have been added since. They include Dead Letter Queues, long polling, larger payloads, and Lambda Event Source Mappings. Some of these features introduce new, and sometimes surprising, ways for a message to disappear.
In this Bite we will look at nine ways a message can be removed from a queue:
  1. Successful processing
  2. Maximum Receive Count exceeded
  3. Maximum Receive Count set too low
  4. Message Retention Period exceeded
  5. Delivery Delay exceeds Message Retention Period
  6. Batch processing with Event Source Mappings
  7. Event Source Mappings with filters
  8. Multiple Event Source Mappings
  9. Event Source Mappings with mismatching filters

1. Successful processing

The first and most obvious way to remove a message is by deliberately deleting it from the queue. This is usually applied by custom consumers (such as an application running in EC2 or a container) after the message has been successfully processed. Take, for example, a queue filled with S3 object names. Each name refers to an image stored in an S3 Bucket. An SQS consumer fetches a message from the queue using short or long polling. This triggers the SQS message's visibility timeout, during which the message stays on the queue, but is invisible to other consumers. The SQS consumer downloads the referred object from S3, resizes it, and stores the result in another S3 location. When this process has succeeded, the consumer permanently deletes the message from the queue.
Lambda Consumers use Event Source Mappings (ESMs) to fetch messages from an SQS queue. The ESM takes care of receiving and deleting messages, so Lambda Functions using ESMs don't have to implement this functionality themselves. For more info on ESMs, see topics 6 to 9.

2. Maximum Receive Count exceeded

An SQS Queue can be configured with a Redrive Policy. This configuration specifies a Dead Letter Queue (DLQ), which is just another SQS Queue. It also specifies a maxReceiveCount value.
1{ 2 "deadLetterTargetArn" : <String>, 3 "maxReceiveCount" : <Integer> 4}
If a consumer is unable to process a message, for example if the data is corrupted or a timeout occurs, the message is made visible again. In SQS terms, the message is received by the queue again. When the amount of times a message has been received by the source queue exceeds the maxReceiveCount value, it is removed from the source queue and put on the DLQ.

3. Maximum Receive Count set too low

The visibility timeout from section 1 protects against unexected failures, such as network issues or crashing consumer applications. When a message is read from a queue but never successfully processed, it is made visible for other consumers again when the visibility timeout expires (hence the name). Another consumer, or the same one, can then retry processing the message. If the queue is configured with a redrive policy and the maxReceiveCount is set too low (e.g. at one), any network or application fault will move the message to the DLQ immediately, and no retry can occur.

4. Message Retention Period exceeded

An SQS Queue can also be configured with a Message Retention Period in seconds. This value specifies how long a message can stay on a queue before it is automatically deleted, regardless of its processing status. The retention period can be set between 60 seconds and 14 days, with the default at 4 days.
Please be aware of a DLQ caveat: a message's retention period is always calculated from the moment a message is put onto the source queue. It is not reset when a message is moved to a DLQ. From the Dead Letter Queue documentation:
The expiration of a message is always based on its original enqueue timestamp. When a message is moved to a dead-letter queue, the enqueue timestamp is unchanged. The ApproximateAgeOfOldestMessage metric indicates when the message moved to the dead-letter queue, not when the message was originally sent. For example, assume that a message spends 1 day in the original queue before it's moved to a dead-letter queue. If the dead-letter queue's retention period is 4 days, the message is deleted from the dead-letter queue after 3 days and the ApproximateAgeOfOldestMessage is 3 days. Thus, it is a best practice to always set the retention period of a dead-letter queue to be longer than the retention period of the original queue.

5. Delivery Delay exceeds Message Retention Period

The Message Retention Period discussed in the previous section is a common, not too complicated mechanism to prevent an SQS Queue from overflowing. However, when it is combined with the SQS Delivery Delay feature, it can lead to unexpected behavior. Delivery Delay allows messages to remain invisible for a set amount of time before they become visible and can be consumed. It can be set between 0 seconds and 15 minutes. This is useful when an source action needs some time to be fully settled before further processing. An example could be a new row in a database, which needs to be copied to an off-site replica. By configuring a one minute delay on the queue, the replication has time to complete.
If the Delivery Delay is set to a higher value than the Message Retention Period, the message will be deleted before it ever became visible. The message will not be moved to the DLQ. The Delivery Delay can be configured for the entire queue, or on a per-message basis.

6. Batch processing with Event Source Mappings

Event Source Mappings are an AWS Lambda feature that can poll SQS queues, collect a number of messages, and invoke a Lambda Function when certain thresholds are met. The Event Source Mapping is an SQS consumer like any other, and like any other it will make the SQS messages invisible during processing. By default, if the Lambda Function fails to process any message in its batch, the entire batch - including successfully processed messages - is made visible again, and can thus be reprocessed. If a Redrive Policy has been configured, messages will be moved to the DLQ when the maxReceiveCount threshold is crossed. This way successfully processed messages can end up on the DLQ too. This problem can partially (pun intended) be avoided by using the partial batch response feature, which allows a Lambda Function to notify the Event Source Mapping which messages in a batch failed. When the Event Source Mapping receives a partial batch response it will delete the successful messages from the queue, and make the failed messages visible again. However, if a Lambda Function runs into an exception and is not able to return a partial batch response, the entire batch is reprocessed again.

7. Event Source Mappings with filters

Event Source Mappings can also be configured with filters. You can read much more about this feature in the Bite Filter DynamoDB Event Streams Sent to Lambda. When filters are applied on Event Source Mappings, they will read all messages on the queue, apply the filters, and pass the messages matching those filters to a Lambda Function. The messages not matching the filter criteria are deleted from the queue immediately. They are not returned to the source queue, they are not put on a DLQ, they are simply deleted as if they were successfully processed. And in a way, they were. The key takeaway is that SQS has been designed for single consumers. These consumers can be a distributed application consisting of many nodes, but these nodes should be homogeneous - that is, any node processing any message should lead to the same results.

8. Multiple Event Source Mappings

An SQS Queue can have multiple Event Source Mappings. For example, you could configure two Lambda Functions to read different messages from the same queue. You might think that if the filters are mutually exclusive, one function would receive one part of the messages, and the other the rest. As described in the previous section, this is not the case. Instead, any of the two Event Source Mappings might receive any message, and depending on its filters, pass the message to its Lambda Function or drop it. The other Event Source Mapping would never even see it, and would never be able to match its filters to it.
This problem also occurs when using multiple Event Source Mappings without filters. Any mapping might receive any message and pass it on to its Lambda Function. Which function would receive which message would be completely arbitrary, but no message will be received by both functions. The only solution is to use one consumer - one Lambda Function - per SQS queue. If a message needs to be processed by multiple consumers, use SNS -> SQS or EventBridge -> SQS to fan out a message over multiple queues.

9. Event Source Mappings with mismatching filters

The final way a message can be inadvertently dropped also involves Event Source Mapping filters. The filters can be configured to match either plain text or valid JSON. If the inbound message does not match the filter pattern (text to JSON or JSON to text), the message is dropped. From the documentation:
Before filtering your messages, Lambda automatically evaluates the format of the incoming message body and of your filter pattern for body. If there is a mismatch, Lambda drops the message. The following table summarizes this evaluation:
Incoming message body formatFilter pattern body formatResulting action
Plain stringPlain stringLambda filters based on your filter criteria.
Plain stringNo filter pattern for data propertiesLambda filters (on the other metadata properties only) based on your filter criteria.
Plain stringValid JSONLambda drops the message.
Valid JSONPlain stringLambda drops the message.
Valid JSONNo filter pattern for data propertiesLambda filters (on the other metadata properties only) based on your filter criteria.
Valid JSONValid JSONLambda filters based on your filter criteria.
When Lambda drops the message, it is deleted from the source queue and will not be put on a DLQ, even if a Redrive Policy has been configured.

Conclusion

Over the years, the Simple Queue Service has gradually become less simple. In this Bite we've seen that Redrive Policies, Message Retention Periods and Event Source Mappings can significantly complicate the way messages behave. It is important to be aware of queue behavior in these edge cases. Event Source Mappings introduce an extra layer of complexity and deserve additional attention in your solution designs.