Bite-Sized Serverless

CloudFormation Icon

CFN Custom Resources backed by Step Functions

CloudFormation - Advanced (300)
CloudFormation Custom Resources are a powerful mechanism to run arbitrary code as part of a CloudFormation deployment. They can be used to create, update and delete resources without CloudFormation coverage, to notify or synchronize external systems when a deployment occurs, or to call any AWS or third-party API. Natively, Custom Resources can only be implemented using SNS or Lambda Functions. In this Bite we will implement a Custom Resource backed by a Step Functions State Machine.
A complete implementation of this solution is available as a CDK project. All code samples on this page are taken from that project. You can download the project on the bottom of this page.

When to use Step Functions State Machines

Custom Resources only have native support for Lambda Functions and SNS Topics. If we'd limit ourselves to these two resources, we would either have to write all our logic in a Lambda Function, or in some external system triggered by SNS. Neither of these are good options when you would like to execute complex or long running workflows as part of your CloudFormation deployment. For example: integration tests that need to succeed before the deployment is marked healthy would be a bad fit for Lambda, but a perfect fit for Step Functions.
In these cases, it makes sense to offload the Custom Resource execution to a Step Functions State Machine. It allows you to use all the powerful features of Step Functions, such as calling every AWS SDK, waiting for callbacks, and deep integration with services like ECS, DynamoDB, SNS and SQS.
An additional benefit is that the execution of the state machine can be triggered externally (without CloudFormation) too, making building and testing your custom resources a lot easier.

Custom Resource inner workings

Before we dive into our Custom Resource - Step Functions integration, let's take a look at the inner workings of Custom Resources. Simply put, a CloudFormation Custom Resource only does two things:
  1. It sends out a payload when it is triggered as part of a CloudFormation deployment
  2. It waits until it receives a response on a callback URL specified in the payload
In your CloudFormation template a Custom Resource is specified like this:
1{ 2 "CustomResource": { 3 "Type": "AWS::CloudFormation::CustomResource", 4 "Properties": { 5 "ServiceToken": { 6 "Fn::GetAtt": ["CustomResourceHandlerE8FB56BA", "Arn"] 7 }, 8 "ExecutionTime": "1635621578.4841878" 9 } 10 } 11}
In this example the ServiceToken points to the ARN of a Lambda Function, and the ExecutionTime is a custom property for our specific use case. For the exact implementation details, check out the CDK project at the bottom of this page. Custom Resources allow you to define as many custom properties as you like, which will be forwarded to the handler specified in the ServiceToken (either a Lambda Function or an SNS Topic).
With this custom resource in place, the payload sent out the Lambda Function looks like the JSON below. Note the ResourceProperties, which are the exact values specified the template above.
1{ 2 "RequestType": "Create", 3 "ServiceToken": "arn:aws:lambda:eu-west-1:123412341234:function:CfnCustomResourcesBackedB-CustomResourceHandlerE8F-fie4wHO4xSHq", 4 "ResponseURL": "https://cloudformation-custom-resource-response-euwest1.s3-eu-west-1.amazonaws.com/arn%3Aaws%3Acloudformation%3Aeu-west-1%3A123412341234%3Astack/CfnCustomResourcesBackedByStepFunctionsStack/da707dd0-39b2-11ec-aab4-0a6f4e95f19d%7CCustomResource%7C79af75b0-5149-4d6f-9ec3-78a3ea00a0ed?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20211030T192019Z&X-Amz-SignedHeaders=host&X-Amz-Expires=7200&X-Amz-Credential=AKIAU7SEXKRMYJRGBAMF%2F20211030%2Feu-west-1%2Fs3%2Faws4_request&X-Amz-Signature=01d9e9519aa8d9a4911247ce79e4a49f0b238e4744776079603ce09471e0e1c7", 5 "StackId": "arn:aws:cloudformation:eu-west-1:123412341234:stack/CfnCustomResourcesBackedByStepFunctionsStack/da707dd0-39b2-11ec-aab4-0a6f4e95f19d", 6 "RequestId": "79af75b0-5149-4d6f-9ec3-78a3ea00a0ed", 7 "LogicalResourceId": "CustomResource", 8 "ResourceType": "AWS::CloudFormation::CustomResource", 9 "ResourceProperties": { 10 "ServiceToken": "arn:aws:lambda:eu-west-1:123412341234:function:CfnCustomResourcesBackedB-CustomResourceHandlerE8F-fie4wHO4xSHq", 11 "ExecutionTime": "1635621578.4841878" 12 } 13}
After this payload has been sent to your handler, CloudFormation waits for a response on the ResponseUrl. As you can see, this is actually an S3 endpoint. The callback you provide is delivered as a JSON file uploaded to this S3 Signed Url.
Uploading the response JSON to S3 is trivial in Python:
1json_body = { 2 "Status": "SUCCESS" if success else "FAILED", 3 "Reason": f'Custom resource {"succeeded" if success else "failed"}', 4 "PhysicalResourceId": logical_resource_id, # LogicalResourceId in payload 5 "StackId": cfn_stack_id, # StackId in payload 6 "RequestId": cfn_request_id, # RequestId in payload 7 "LogicalResourceId": logical_resource_id, # LogicalResourceId in payload 8} 9 10requests.put( 11 url=cfn_url, # ResponseURL in payload 12 json=json_body, 13)
The values specified in the JSON body are the literal values from the payload above. The Status key can be either "SUCCESS" or "FAILED". If the Status is "FAILED", the Reason field is mandatory.

Integrating with Step Functions

Custom Resources can only send their payload to Lambda Functions or SNS Topics. But there is no reason why these resources couldn't forward the request anywhere else. The only requirement is that the receiving service reliably performs the callback when it is done.
Integrating Step Functions into Custom Resources requires an additional two steps:
  1. The Lambda Function receiving the Custom Resource payload needs to forward the payload to a State Machine
  2. The State Machine needs to perform the callback when it is done.
Step one is, again, pretty trivial in Python:
1"""Lambda function used as a custom resource.""" 2 3import json 4import os 5import boto3 6 7sfn_client = boto3.client("stepfunctions") 8state_machine_arn = os.environ.get("STATE_MACHINE_ARN") 9 10 11def lambda_handler(event, _context): 12 """Receive an event from CloudFormation, pass it on to a Step Functions State Machine.""" 13 sfn_client.start_execution( 14 stateMachineArn=state_machine_arn, 15 input=json.dumps(event), 16 )
Step two is a bit more involved, but only a little. It requires the State Machine to forward the original execution payload (provided by the Custom Resource) to the Update CloudFormation Lambda Function. This function should also receive the results of the actual workflow. To achieve this, we combine the two inputs in a custom payload:
1{ 2 "ExecutionInput.$": "$$.Execution.Input", 3 "LambdaResults.$": "$" 4}
This will put the payload that started the execution in ExecutionInput and the results of the Lambda Function in LambdaResults. We can then parse and use both fields in our Update CloudFormation function like below.
1"""Lambda function that reports the state machine results back to CFN.""" 2import json 3import requests 4 5 6def lambda_handler(event, _context): 7 """Return a success or failure to the CFN Custom Resource.""" 8 cfn_url = event["ExecutionInput"]["ResponseURL"] 9 cfn_stack_id = event["ExecutionInput"]["StackId"] 10 cfn_request_id = event["ExecutionInput"]["RequestId"] 11 logical_resource_id = event["ExecutionInput"]["LogicalResourceId"] 12 13 lambda_results = event["LambdaResults"] 14 success = "Error" not in lambda_results.keys() 15 16 json_body = { 17 "Status": "SUCCESS" if success else "FAILED", 18 "Reason": f'State Machine {"succeeded" if success else "failed"}', 19 "PhysicalResourceId": logical_resource_id, 20 "StackId": cfn_stack_id, 21 "RequestId": cfn_request_id, 22 "LogicalResourceId": logical_resource_id, 23 } 24 25 requests.put( 26 url=cfn_url, 27 json=json_body, 28 ) 29

Catching errors in complex workflows

The example above might seem overly simplistic. After all, if all the workflow does is trigger a single Lambda Function, we do not need the workflow at all - we could have put this in the original resource handler. So let's take a look at a more complex use case.
In the state machine below, we first trigger one Lambda Function. If it succeeds a second one is called, and when that succeeds, an ECR task is started. The problem with this workflow is that every step needs a catch that points to the Update CloudFormation step. After all, CloudFormation needs to know when a failure occurred so it can roll back the deployment. And without a catch, the State Machine would simply terminate on an error, and never report back.
To avoid having to write a catch for every step, we put the entire workflow in a parallel state with a single branch. If any step in this workflow fails, the error will be propagated to the parallel state. The parallel state has been configured with a catch, which assures that any error is reported back to CloudFormation.

Conclusion

In this Bite we have detailed how Custom Resources can be integrated with Step Functions State Machines. This solution allows us to execute complex workflows as part of our CloudFormation deployments. It also reduces the amount of code we need to write and maintain, by leveraging the extremely powerful and extensive AWS integrations offered by Step Functions. Finally, it allows us to easily extend and debug the custom resource by testing and verifying changes to the state machine before committing the change into our infrastructure as code.

CDK Project

The services and code described in this Bite are available as a Python AWS Cloud Development Kit (CDK) Project. Within the project, execute a cdk synth to generate CloudFormation templates. Then deploy these templates to your AWS account with a cdk deploy. For your convenience, ready-to-use CloudFormation templates are also available in the cdk.out folder. For further instructions how to use the CDK, see Getting started with the AWS CDK.

Click the Download button below for a Zip file containing the project.