Skip to main content

When making calls to AWS via the boto3 AWS Python SDK, you may encounter situations where the SDK will retry calls to AWS. By default, the SDK doesn’t signal that this has occurred. Instead, your call might take a little longer due to the retries, before succeeding or failing.

When you need to track the number of retry attempts (e.g. for monitoring or debugging) you can use the boto3 event system to call custom functions.

How-To: Tracking boto3 Retries

  1. Write a function which accepts attempts as an argument, and **kwargs as another, e.g., handle_retry(attempts: int, **kwargs).
    1. If you want to only see retried calls, filter on attempts > 1.
    2. If you wish to capture more arguments from kwargs, add them as normal arguments to the function.
  2. Register your function with the boto3 client or resource instance.
    1. If you are using a boto3 client then use myclient.meta.events.register("needs-retry.*", handle_retry).
    2. If you are using a boto3 resource then use. myresource.meta.client.meta.events.register("needs-retry.*", handle_retry).
  3. Every time your function is called you can run custom logic, e.g., to emit a retry metric or increment a counter.

Example

import boto3

def increment_metric(name):
    print(f"{name}|increment|count=1")

def handle_retry(event_name: str, attempts: int, **kwargs):
    if attempts > 1:
        increment_metric(event_name)

s3 = boto3.client("s3")
s3.meta.events.register("needs-retry.*", handle_retry)
s3.list_buckets()

This example creates an S3 client, then registers a handle_retry function. When list_buckets is called the needs-retry.s3.ListBuckets event will be fired and handle_retry will receive it.

If there are no retries, your code will be called with handle_retry(event_name=”needs-retry.s3.ListBuckets”, attempts=1, kwargs={...}). This in turn calls increment_metric(“needs-retry.s3.ListBuckets”).

If there are any retries, your handle_retry function will be called for each retry, with an ever increasing attempts argument.

In this example, increment_metric writes out needs-retry.s3.ListBuckets|increment|count=1 for every event with an attempts > 1 (i.e. on retries). This can be queried later using CloudWatch Logs Insights.

Possible Issues

Different retry mechanisms in boto3 might behave differently. This code has been verified with default configuration on boto3 1.24.75 (and the related botocore 1.27.75). The Legacy Retry Mode is the default for this version.

The potential payloads and arguments event handlers receive is not fully defined, it is possible these might change from one boto3 version to the next.

If you need more per call context than provided by boto3 you need to explore mechanisms for tracking this, e.g. via the contextvars module.

Further Reading

If you have a question or need help with your AWS projects, don’t hesitate to reach out to us, we’d love to help!