Bite-Sized Serverless

StepFunctions Icon

Serverless Integration Testing with Step Functions

StepFunctions - Advanced (300)
As your applications become more serverless, you will find that your classic testing strategies no longer suffice. Unit tests don't cover services like IAM, and end-to-end tests aren't granular enough. Integration tests are a natural fit to cover the gap, as a significant chunk of your application now consists of integrations between native services. In this Bite we will cover how Step Functions can be used to run comprehensive integration tests on your serverless environments.
Every component, integration test and line of code discussed in this Bite is available as a CDK project. Download this project at the bottom of this page and deploy it to your own AWS account to see and play around with real life examples.
We will discuss two examples of serverless implementations that are perfect fits for serverless integration testing. The first is a foundational serverless solution: an PNG image is uploaded to an S3 Bucket which triggers a Lambda Function. The function calculates the image's dimensions and writes them to the object's metadata in S3.
Side note: there is a better way to process S3 notifications. See Monitor Events from Multiple S3 Buckets with EventBridge for details.
When the event notification has been processed successfully, the resulting metadata can be viewed in the console as well as through the AWS SDKs and APIs.
The second example involves DynamoDB Streams and CloudWatch Logs. A DynamoDB Table is configured to stream its changes to a Lambda Function. It uses Event Source Filtering to process only newly inserted USER entities. When the Lambda Function receives a batch of events, it writes an audit log entry for each event to CloudWatch Logs. For a deep dive on Event Source Filtering, see the Bite Filter DynamoDB Event Streams Sent to Lambda.
Both examples are impossible to test with unit tests. Unit tests could cover the internals of the Lambda Functions, but that wouldn't tell us if the S3 Event Notifications, IAM permissions, or DynamoDB Streams are configured correctly.

Introducing serverless integration testing

According to Martin Fowler, integration tests determine if independently developed units of software work correctly when they are connected to each other. But in serverless applications software doesn't exist as a standalone entity. Instead, an application consists of services (infrastructure), such as API Gateway, and software, such as Lambda Functions. In some cases, there is no software in the classic sense at all: an application might entirely consist of interconnected AWS services. For serverless, then, integration tests determine if independently configured services work correctly when connected together. A few examples:
  • Does the Custom Authorizer connected to API Gateway behave as expected?
  • Does the Lambda Function write a file to S3 when triggered?
  • Does AppSync return the correct values from DynamoDB?
And of course:
  • Does uploading an object to S3 trigger generate the correct metadata?
  • Does creating a new user in DynamoDB generate an audit log?
But there is another responsibility for integration tests in serverless environments: they are the best testing methodology to verify single services are configured correctly. For example, API Gateway can be configured with request validators and access logs. Integration tests can be used to verify that invalid requests are dropped by API Gateway, and that access logs are written for both valid and invalid requests. In these examples integration tests verify that your configuration of AWS services yield the desired results.
In a nutshell, integration tests are used to validate that serverless implementations work as designed. When executing integration tests, the testing framework needs access (generally IAM authorization, sometimes network access) to the systems being tested. In our examples these are the S3 Bucket, the DynamoDB Table and the CloudWatch Log Group. By running the tests in Lambda Functions, we can assign exactly the permissions required, and keep access to our resources within our own AWS Account's boundaries.
We define our integration tests using the common Arrange, Act, Assert (AAA) framework. In the Arrange step we prepare the test. In the Act step we perform the action we want to test. In the Assert step we verify that the action has completed successfully. These steps are orchestrated in a Step Functions State Machine.
The actual tests are implemented in Python using the boto3 SDK. The arrange and act steps are very lightweight:
1"""Lambda Function for the Arrange and Act steps of the S3 test.""" 2 3# Standard library imports 4import os 5import time 6 7# Third party imports 8import boto3 9 10 11s3 = boto3.resource("s3") 12s3_bucket_name = os.environ.get("S3_BUCKET") 13 14 15def event_handler(_event, _context): 16 """Arrange and Act: put the example file in the S3 Bucket.""" 17 # 1. Arrange 18 now = time.time() 19 object_key = f"test_file_{now}.png" 20 21 # 2. Act 22 try: 23 s3.Bucket(s3_bucket_name).upload_file("example.png", object_key) 24 return {"act_success": True, "test_object_key": object_key} 25 except Exception: # pylint: disable=broad-except 26 return {"act_success": False, "error_message": "failed to put object"} 27
The assert and cleanup steps cover a few more lines, but those are mostly taken up by the validations we want to perform.
1"""Lambda Function for the Assert and Clean Up steps of the DDB test.""" 2 3# Standard library imports 4import os 5 6# Third party imports 7import boto3 8 9 10s3_client = boto3.client("s3") 11s3_bucket_name = os.environ.get("S3_BUCKET") 12 13 14def event_handler(event, _context): 15 """Assert and Clean Up: verify the metadata and delete the object.""" 16 # If the arrange / act step returned an error, bail early 17 if not event["arrange_act_payload"]["act_success"]: 18 return error_response(event["arrange_act_payload"]["error_message"]) 19 20 test_object_key = event["arrange_act_payload"]["test_object_key"] 21 22 # 3. Assert 23 image_object = s3_client.get_object(Bucket=s3_bucket_name, Key=test_object_key) 24 25 # Assert metadata is present 26 if "Metadata" not in image_object: 27 return clean_up_with_error_response(test_object_key, "metadata not found") 28 # Assert image_height is present 29 if "image_height" not in image_object["Metadata"]: 30 return clean_up_with_error_response( 31 test_object_key, "'image_height' metadata not found" 32 ) 33 # Assert image_width is present 34 if "image_width" not in image_object["Metadata"]: 35 return clean_up_with_error_response( 36 test_object_key, "'image_width' metadata not found" 37 ) 38 # Assert image_height matches expected value 39 if image_object["Metadata"]["image_height"] != "178": 40 return clean_up_with_error_response(test_object_key, "'image_height' incorrect") 41 # Assert image_width matches expected value 42 if image_object["Metadata"]["image_width"] != "172": 43 return clean_up_with_error_response(test_object_key, "'image_width' incorrect") 44 45 # Return success 46 return clean_up_with_success_response(test_object_key) 47 48 49def error_response(error_message): 50 """Return a well-formed error message.""" 51 return { 52 "success": False, 53 "test_name": "s3_png_metadata", 54 "error_message": error_message, 55 } 56 57 58def clean_up_with_error_response(test_object_key, error_message): 59 """Remove the file from S3 and return an error message.""" 60 s3_client.delete_object(Bucket=s3_bucket_name, Key=test_object_key) 61 return error_response(error_message) 62 63 64def clean_up_with_success_response(test_object_key): 65 """Remove the file from S3 and return a success message.""" 66 s3_client.delete_object(Bucket=s3_bucket_name, Key=test_object_key) 67 return {"success": True, "test_name": "s3_png_metadata"}

Execute the tests with every deployment

By executing the integration tests with every CloudFormation deployment of our infrastructure, we can verify that the system performs as expected after every change. The mechanism to integrate Step Functions into our deployments is described in CFN Custom Resources backed by Step Functions. In that Bite we cover how we can use the Step Functions parallel state to capture errors, and how to perform a callback to CloudFormation. With these components in place, our State Machine looks like this.
The left execution performed as expected. The right execution encountered an error in one of the steps, which was caught by the parallel state and communicated back to CloudFormation.

Add the DynamoDB integration test

The State Machine above contains a single branch which tests the S3 event notification component. In the last section of this Bite, we will add the test for the DynamoDB Stream in a second parallel branch. Like in the S3 test, we have Arrange, Act, Assert and Clean Up steps, deployed as two Lambda Functions with a wait in between. Because DynamoDB Streams and CloudWatch might take a little while to process a new entry, the wait time is set to ten seconds.
This integration test will create a user in DynamoDB, wait a while, then verify that the expected audit log is present in the CloudWatch Logs group. When the test is finished, the user is removed from the DynamoDB Table again.
The code to create the user (arrange and act) is again very simple:
1"""Lambda Function for the Arrange and Act steps of the DDB test.""" 2 3# Standard library imports 4import os 5import time 6 7# Third party imports 8import boto3 9 10 11ddb_table_name = os.environ.get("DDB_TABLE") 12ddb_table = boto3.resource("dynamodb").Table(name=ddb_table_name) 13 14 15def event_handler(_event, _context): 16 """Arrange and Act: create a user in DDB.""" 17 # 1. Arrange 18 now = time.time() 19 user_object = {"PK": f"USER#{now}", "SK": f"USER#{now}"} 20 21 # 2. Act 22 try: 23 ddb_table.put_item(Item=user_object) 24 return {"act_success": True, "test_user_key": user_object} 25 except Exception: # pylint: disable=broad-except 26 return {"act_success": False, "error_message": "failed to write to DDB"}
The act and cleanup code looks a lot like the S3 test, but the head_object operation has been replaced with a filter_log_events API, and the asserts are specific to the DynamoDB Audit Log use case.
1"""Lambda Function for the Assert and Clean Up steps of the DDB test.""" 2 3# Standard library imports 4import json 5import os 6from datetime import datetime, timedelta 7 8# Third party imports 9import boto3 10 11 12logs_client = boto3.client("logs") 13log_group_name = os.environ.get("LOG_STREAM_NAME") 14 15ddb_table_name = os.environ.get("DDB_TABLE") 16ddb_table = boto3.resource("dynamodb").Table(name=ddb_table_name) 17 18 19def event_handler(event, _context): 20 """Assert and Clean Up: verify the metadata and delete the object.""" 21 # If the arrange / act step returned an error, bail early 22 if not event["arrange_act_payload"]["act_success"]: 23 return error_response(event["arrange_act_payload"]["error_message"]) 24 25 test_user_key = event["arrange_act_payload"]["test_user_key"] 26 test_user_pk = test_user_key["PK"] 27 test_user_sk = test_user_key["SK"] 28 29 expected_json = { 30 "EventType": "UserCreated", 31 "PK": {"S": test_user_pk}, 32 "SK": {"S": test_user_sk}, 33 } 34 35 # 3. Assert 36 37 # Filter the sought event from the CloudWatch Log Group 38 filter_pattern = ( 39 f'{{ ($.EventType = "UserCreated") && ($.SK.S = "{test_user_pk}") ' 40 f'&& ($.PK.S = "{test_user_sk}") }}' 41 ) 42 43 # Set the search horizon to one minute ago 44 start_time = int((datetime.today() - timedelta(minutes=1)).timestamp()) * 1000 45 46 # Execute the search 47 response = logs_client.filter_log_events( 48 logGroupName=log_group_name, startTime=start_time, filterPattern=filter_pattern 49 ) 50 51 # Assert exactly one event matching the pattern is found 52 if "events" not in response: 53 return clean_up_with_error_response( 54 test_user_pk, test_user_sk, "events not found" 55 ) 56 57 if len(response["events"]) == 0: 58 return clean_up_with_error_response( 59 test_user_pk, test_user_sk, "event not found" 60 ) 61 62 if len(response["events"]) != 1: 63 return clean_up_with_error_response( 64 test_user_pk, test_user_sk, "more than one event found" 65 ) 66 67 if json.loads(response["events"][0]["message"]) != expected_json: 68 return clean_up_with_error_response( 69 test_user_pk, test_user_sk, "log event does not match expected JSON" 70 ) 71 72 # Return success 73 return clean_up_with_success_response(test_user_pk, test_user_sk) 74 75 76def error_response(error_message): 77 """Return a well-formed error message.""" 78 return { 79 "success": False, 80 "test_name": "ddb_user_audit_log", 81 "error_message": error_message, 82 } 83 84 85def clean_up_with_error_response(test_user_pk, test_user_sk, error_message): 86 """Remove the file from DDB and return an error message.""" 87 ddb_table.delete_item( 88 Key={ 89 "PK": test_user_pk, 90 "SK": test_user_sk, 91 } 92 ) 93 return error_response(error_message) 94 95 96def clean_up_with_success_response(test_user_pk, test_user_sk): 97 """Remove the file from DDB and return a success message.""" 98 ddb_table.delete_item( 99 Key={ 100 "PK": test_user_pk, 101 "SK": test_user_sk, 102 } 103 ) 104 return {"success": True, "test_name": "ddb_user_audit_log"}
We add this test to the integration test State Machine as a branch of the parallel state. If both tests are successful, an execution looks like the one on the left. If an error occurs, it looks like the one on the right. In both cases the result is reported back to CloudFormation.

Conclusion

In this Bite we have seen how AWS State Machines and Lambda Functions can be used to run integration tests on serverless applications. With serverless integration tests we have a cost effective and highly reliable method to gain granular insights into the functioning of our applications. And by integrating the State Machine into our CloudFormation deployments we can protect against regressions before our code ever comes near a production environment.

CDK Project

The services and code described in this Bite are available as a Python AWS Cloud Development Kit (CDK) Project. Within the project, execute a cdk synth to generate CloudFormation templates. Then deploy these templates to your AWS account with a cdk deploy. For your convenience, ready-to-use CloudFormation templates are also available in the cdk.out folder. For further instructions how to use the CDK, see Getting started with the AWS CDK.

Click the Download button below for a Zip file containing the project.