Bite-Sized Serverless

CloudFront Icon

Cache Control with CloudFront Functions

CloudFront - Advanced (300)
Asset caching is an essential component for fast and efficient websites. Caching allows browsers and content distribution networks (CDNs) to locally store a copy of an asset, such as an image or a javascript file. The next time a user requests the same asset, it can be served from the local cache instead of the origin. This improves loading speed and user experience, while reducing stress on the origin. In this Bite we will cover how we can configure and instruct our CDNs and browsers to optimally cache our assets. We will use CloudFront behaviors to configure the CDN and CloudFront Functions to add HTTP caching headers.
This Bite is supported by a CDK project, available at the bottom of this page. The project contains a deployment with a fully configured CloudFront Distribution with CloudFront Functions. All code and examples in this Bite are taken from the CDK project.
In the diagram above, User A might request a file demo.jpg from the origin. This file has not been requested by anyone before, so it is retrieved from the origin (an S3 bucket). While returning the file to the user, CloudFront copies the object to its local cache. When User B requests the same file, it is already stored in the edge location and returned with very low latency. When User C requests the file, their local point-of-presence does not have the asset available, and it has to be fetched. Invisible to the user, there is actually another layer of caches between the points-of-presence and the origin, called regional edge locations. You can read more about these and other CloudFront components in the Bite The Anatomy of a CloudFront Distribution.
Although the concept of storing objects for other users sounds easy enough, caching is actually a remarkably complex topic. The challenge is to cache assets (e.g. an image, PDF or video) as long as the original remains unchanged, while simultaneously making sure the asset gets updated in the browser when the original changes. In this Bite we will see how trying to solve both these goals requires advanced solutions.
First, let's look at the difference between CDN and browser caching. As described above, CDNs provide a shared cache for many users of the same website. This is also called a public cache. When different users request the same file, it can be returned straight from the cache. A browser cache, also called a private cache, is not shared between different visitors of a website. Instead, it lives on a user's computer, and can only be used by that user. When a user first visits a website, its assets are downloaded and displayed in the browser. The assets are then stored in the browser itself. When the user visits the same site again, the browser does not even need to connect to the CDN or origin to retrieve the assets. Instead, it immediately returns the files from its local storage.

Configuring CDNs to store assets

Caches work very well for assets that don't change very often, like images and PDFs. They don't work well at all for dynamic content, such as user-generated content, live content, or personal content you don't want to share with others. So how can we tell CDNs and browsers what they should and shouldn't cache, and if they cache, how long they should retain assets in their local storage?
This question has two different answers: one for CDNs and one for browser cache. Let's look at the CDN, and CloudFront specifically, first.

Configuring the CloudFront cache time to live (TTL)

CloudFront is configured through 'behaviors'. A distribution can have many different behaviors for different paths and file types, which are defined in CacheBehaviors. When no CacheBehaviors match a request, or when no CacheBehaviors are defined, CloudFront falls back to the DefaultCacheBehavior. The only difference between the two is the mandatory PathPattern for the CacheBehaviors. This PathPattern can be a path prefix like images/*, an extension suffix like *.jpg or a combination of both, like images/*.jpg.
Each CacheBehavior has a CachePolicy which defines how the cache should behave. The CachePolicy can be configured with a Minimum Time to Live (or MinTTL), which is the amount of time a file should at least stay in the cache before CloudFront checks if a new version is available on the origin, and a MaxTTL which defines how long a file can at most be stored in an edge cache before CloudFront checks if the origin has a newer version. Then there is the DefaultTTL which defines how long a file will remain cached by default. This value can be overwritten through headers, but that's out of scope for now.

Configuring the cache key

The cache key is the value based on which CloudFront will determine if a file is already present in a cache. The simplest version is just the filename, for example demo.jpg. However, CloudFront can also be configured to include other values in the cache key. These other values include headers, cookies, and the query string. A query string might look like this: demo.jpg?v=0. If we don't include the query string in the cache key, demo.jpg?v=0 and demo.jpg?v=1 will be considered the same asset by CloudFront. However, if we configure CloudFront to include the v query parameter in the cache key, the requests demo.jpg?v=0 and demo.jpg?v=1 will be considered different, and demo.jpg?v=1 will not be served from the cache if only demo.jpg?v=0 has been stored. Later in this Bite we will see how we can use this behavior to store assets for a very long time.
The code below shows a CachePolicy configured to store assets for one year. The default, minimum and maximum TTL are all set to the same value, so CloudFront will always try to cache the files for that duration. The query string has been configured to include both the v and h query parameters. To view this code in context, download the CDK project at the bottom of this page.
1# Create a policy that retains images for a year 2one_year_in_seconds = 60 * 60 * 24 * 365 # 31 536 000 3cache_policy_one_year = cfr.CfnCachePolicy( 4 scope=self, 5 id="CachePolicyOneYear", 6 cache_policy_config=cfr.CfnCachePolicy.CachePolicyConfigProperty( 7 default_ttl=one_year_in_seconds, 8 max_ttl=one_year_in_seconds, 9 min_ttl=one_year_in_seconds, 10 name="CachePolicyOneYear", 11 parameters_in_cache_key_and_forwarded_to_origin=cfr.CfnCachePolicy.ParametersInCacheKeyAndForwardedToOriginProperty( # noqa: E501 pylint: disable=line-too-long 12 cookies_config=cfr.CfnCachePolicy.CookiesConfigProperty( 13 cookie_behavior="none" 14 ), 15 enable_accept_encoding_gzip=True, 16 enable_accept_encoding_brotli=True, 17 headers_config=cfr.CfnCachePolicy.HeadersConfigProperty( 18 header_behavior="none" 19 ), 20 query_strings_config=cfr.CfnCachePolicy.QueryStringsConfigProperty( 21 query_string_behavior="whitelist", 22 query_strings=["v", "h"], # v for version, h for hash 23 ), 24 ), 25 ), 26)

The power of versions and hashes

Let's look at why we would add a version query parameter (?v=1) or a hash query parameter (?h=a2e5e98) to our URLs. As discussed at the top of the article, we want to use caching to achieve two opposing goals: we want assets to be stored in caches as long as possible, but we also want users to immediately fetch the latest version of an asset when it gets updated. The trick to achieve this is called cache busting, and the v and h query strings are examples of cache busting in practice. Consider the HTML below:
1<html> 2 <body> 3 <img src="demo.jpg" /> 4 </body> 5</html>
This simple page will display the asset demo.jpg. We have configured our CloudFront cache as above, so it will store the image in its edge caches for a year. We might update demo.jpg somewhere during that year, but the cache will continue to serve the old version, which is not what we want. Now consider the following HTML:
1<html> 2 <body> 3 <img src="demo.jpg?v=0" /> 4 </body> 5</html>
CloudFront will again cache the image for a year, but now it will use demo.jpg?v=0 as the cache key. If we want to update the image, we simply update the HTML...
1<html> 2 <body> 3 <img src="demo.jpg?v=1" /> 4 </body> 5</html>
... and CloudFront will consider this a new file, not yet present in its edge locations. Cache busted!

Using CloudFront Functions to enforce the use of hashes

The CloudFront documentation states that CloudFront does not consider query strings or cookies when evaluating the path pattern. In other words, we can't create a long TTL CacheBehavior specifically for images with a hash (paths including ?h=*), while specifying another short TTL CacheBehavior for images without one. To prevent users from requesting assets without a hash, we will use CloudFront Functions to reject requests without a valid query string.
CloudFront Functions can either be used to mutate requests before they arrive at the edge location (called a viewer-request) or after a response leaves the edge location (called a viewer-response). The following code is a viewer-request function that blocks invalid image requests. Again, to view this code in context, download the CDK project at the bottom of this page.
1function handler(event) { 2 var request = event.request; 3 4 // If the asset has a .png or .jpg extension, but the ?h=<hash> 5 // query string is not provided, return a 403 forbidden error. 6 if (request.uri.match(/^.*(\.png|\.svg)$/) && !("h" in request.querystring)) { 7 var response = { 8 statusCode: 403, 9 statusDescription: "Forbidden", 10 headers: { error: { value: "Cannot request image without hash" } }, 11 }; 12 return response; 13 } 14 return request; 15}
With this function in place we can guarantee that images are requested with a hash and can safely cache them for a very long time.

Using CloudFront Functions to instruct browsers to cache assets

Up till now we have only discussed caching assets in the CDN. Caching files in browsers works much the same way, but we need to use headers to instruct browsers if, when and how to retain files locally. A commonly used header for this purpose is Cache-Control. An example cache control header to tell browsers they can store files for a year is Cache-Control: public, max-age=31536000, immutable. In this header value, public means any cache node (including the browser) can store the file. max-age=31536000 means the file can be stored for a year. immutable means the file will never change.
The origin (for example a web server or an S3 bucket) can set these headers on a per-file basis. But we can also standardize the solution with CloudFront functions. The code below adds immutable Cache-Control headers to any image file with a h query parameter served by CloudFront. This is a viewer-response function, which will be executed right after any request is served. This code is also included in the CDK project available on the bottom of this page.
1function handler(event) { 2 var request = event.request; 3 var response = event.response; 4 var headers = response.headers; 5 6 if (request.uri.match(/^.*(\.png|\.svg)$/) && "h" in request.querystring) { 7 // For an image with a hash (`?h=`) return a very long cache duration 8 headers["cache-control"] = { 9 value: "public, max-age=31536000, immutable", 10 }; 11 } else { 12 // Everything else must revalidate and can't cache locally 13 headers["cache-control"] = { 14 value: "public, max-age=0, must-revalidate", 15 }; 16 } 17 18 return response; 19}
With this viewer-response function in place, any asset served by CloudFront will have the right Cache-Control headers, instructing browsers to cache the files up to one year.

Conclusion

In this Bite we have covered how caching files in a CDN or browser can greatly improve your website's loading speed. By adding a query parameter to your assets, you can allow these assets to be stored in caches for a very long time. Changing the query parameter will force the asset to be reloaded. This technique is called cache busting. We can use a CloudFront viewer-request function to prevent assets from being requested without a query parameter. We can use a CloudFront viewer-response function to automatically add Cache-Control headers to any long-lived asset served by CloudFront.

CDK Project

The services and code described in this Bite are available as a Python AWS Cloud Development Kit (CDK) Project. Within the project, execute a cdk synth to generate CloudFormation templates. Then deploy these templates to your AWS account with a cdk deploy. For your convenience, ready-to-use CloudFormation templates are also available in the cdk.out folder. For further instructions how to use the CDK, see Getting started with the AWS CDK.

Click the Download button below for a Zip file containing the project.