AWS Step Functions are difficult to test. I found an approach to testing that helped in one particular scenario. It relies on using the same programming language for both testing and infrastructure-as-code (IaC). I hope the idea helps others tame their Step Functions into a testable submission.

Why are Step Functions hard to test?

I’m most familiar with unit testing. I’m used to lifting some small component from my system and isolating it in a different runtime environment. There, I poke it with my special inputs, and prod at the results until I’m satisfied it behaves how I want. By isolating it, I gain confidence it serves its purpose when it is executed as part of the system.

This familiar technique of decomposing a program into smaller units to instantiate and test individually is not an option with Step Functions. The definition of a state machine is one atomic unit. I found no way to take an individual state, and poke and prod it in isolation. If I want to test a Step Function, I deploy it to my developer environment on AWS and execute the whole thing. Note that this is not the same debate as whether you should test locally or in the cloud. Either way, the execution of a Step Function cannot be decomposed.

If that explanation does not work for you, try this: Step Functions are difficult to test in the same way stored procedures on a database server are hard to test.

What about integration testing a Step Function?

I have taken this approach before. As is typical for integration tests, I found them easy to define, but difficult to make deterministic, reliable and fast. There are some nice aspects to the AWS API that help – in particular, retrieving the events in an execution’s history is useful for asserting that particular state transitions were made.

However, having to transition through the entire state machine for every test made set up harder. Made each test case slower and less reliable. Made every change of the Step Function definition require a deployment, which stretches out the feedback loop (even with the CDK’s --hotswap option, which made a huge difference).

When I came to introduce a DynamoDB task to a Step Function I wanted something better.

Adding a DynamoDB task to a Step Function

I had a use case for adding a DynamoDB task to a Step Function. The specifics of why are not too important, what’s important is that the single DynamoDB task covers a lot of behaviour in a small task definition. I’ll explain just enough of the use case and Dynamo interaction to highlight why testing via Step Function executions causes problems.

My use case was to limit concurrent access to a shared resource. I decided on a semaphore implemented via DynamoDB. Before accessing the shared resource, the Step Function should try to “acquire a permit”, and could not continue if none are available. A condition expression and a counter would model the metaphor of acquiring permits until they’ve all been taken.

The DynamoDB updateItem task in the Step Function task definition contained these expressions:

"acquirePermit": {
  "Type": "Task",
  "Resource": "arn:aws:states:::dynamodb:updateItem",
  "Parameters": {
    "ConditionExpression": "(attribute_not_exists(#ownerCount) or #ownerCount < :limit) and attribute_not_exists(#ownerId)",
    "UpdateExpression": "SET #ownerCount = if_not_exists(#ownerCount, :initializedOwnerCount) + :increase, #ownerId = :acquiredAt"
  // several attributes omitted for brevity

A brief overview of the desired behaviour:

  1. if permits are available, acquirePermit should be successful and the owner count should be incremented and ownerId added as a key
  2. if a DynamoDB Item representing the semaphore does not already exist, acquirePermit should create an item and initialize the ownerCount as 1
  3. if zero permits are available, acquirePermit task should fail and leave the Item untouched

There are even more semantics and edge cases not included here. Before even getting to the subsequent DynamoDB call to release a permit.

I couldn’t bring myself to write several tests which all execute the Step Function. I want fewer slow and unreliable tests in my life. Getting the conditions correct required upfront thinking, but also some trial and error while I explored and improved the code. The feedback loop had to be tight. I needed to make the task of the next developer to touch this easier, especially if that developer is me.

How the shared programming language enabled better tests

Instead of testing the Step Function by executing it, I took advantage of using TypeScript for both CDK and integration tests in the same repository. I defined constants in their own file:

// semaphoreExpressions.ts

export const acquireUpdateExpression =
    "SET #concurrencyLimit = :limit, #ownerCount = if_not_exists(#ownerCount, :initializedOwnerCount) + :increase, #ownerId = :acquiredAt";
export const acquireConditionExpression =
    "(attribute_not_exists(#ownerCount) or #ownerCount < :limit) and attribute_not_exists(#ownerId)";

In defining the Step Function construct with CDK, those expressions are only an import away:

// cdkStack.ts

import {
} from "./semaphoreExpressions";

const acquirePermit = new tasks.DynamoUpdateItem(this, "acquirePermit", {
      table: myTable,
      updateExpression: acquireUpdateExpression,
      conditionExpression: acquireConditionExpression,
      returnValues: DynamoReturnValues.ALL_NEW,
      resultPath: "$.acquirePermitResult"
      // several attributes omitted for brevity

const definition = sfn.Chain.start(;

new sfn.StateMachine(this, "MyStateMachine", { definition });

Big whoop right? I’ve taken a string constant and imported it into another file. Hardly the stuff of Knuth or Dijkstra. So what?

With this simple change, I can use the DynamoDB client to execute the query in a Jest test:

// dynamoBasedSempahore.test.ts

import {
} from "./semaphoreExpressions";

const acquire = async (/* params omitted for brevity */) => {
    const documentClient = new DocumentClient();

    const acquirePermitUpdateItem: DocumentClient.UpdateItemInput = {
        TableName: "MyDynamoTableInDevelopmentEnvironment",
        UpdateExpression: acquireUpdateExpression,
        ConditionExpression: acquireConditionExpression,
        ReturnValues: "ALL_NEW"
        // several attributes omitted for brevity

    return documentClient.update(acquirePermitUpdateItem).promise();

it("can acquire a permit", async () => {
    const response = await acquire();
    expect(response).toEqual(/* verify successful response*/);
    // can fetch Dynamo Item and assert on attributes here if you wish

I run the test from my machine. I can iterate on the query without worrying about deploying or executing the Step Function. Sure, I need a real DynamoDB table, but that is rarely modified.

Hello again, Feedback Loop, my old friend.

Where does this leave me? I now have a Step Function containing an dynamodb:updateItem task, where I have confidence in the behaviour of its query. If I need to make a change, I can get feedback by running an integration test that only depends on DynamoDB and the TypeScript code in my editor. Which is a damn sight easier to make robust, deterministic, and fast. I’ve avoided using a Lambda task and the operational burden it introduces.

I still need to have confidence that my Step Function transitions to and from this state correctly. I resorted to an integration test for the happy path in this case. Here, one test case is less bad than several. I could have added CDK unit test to verify the Step Function construct uses the same condition string, but I felt that was overkill.


This approach would not have been as easy without a shared programming language for both tests and infrastructure-as-code. If the Step Function was defined in the YAML of a CloudFormation or SAM template for example, sharing a constant would have been more effort than just importing across two different TypeScript files.

Step Functions are hard to test. Having a shared programming language across IaC and tests allowed a creative way to gain confidence in my system, with more maintainable tests.

As a parting thought, it would be interesting to approach this testing problem from “above” the Step Function instead of from “below” as I’ve done here. For example, maybe the same DynamoDB behaviour could be written once in a declarative model and used to generate code for local testing, as well as the ASL task definition. When Step Function ASL is generated or transpiled from a different model, there would be lots of clever things you could do.

I found this pattern to play nicely with DynamoDB tasks, what other Step Function task types could it benefit?

comments powered by Disqus


Graham "Grundlefleck" Allan is a Software Developer living in Scotland. His only credentials as an authority on software are that he has a beard. Most of the time.

© Copyright 2013-2022