When making calls to AWS via the boto3 AWS Python SDK, you may encounter situations where the SDK will retry calls to AWS. By default, the SDK doesn’t signal that this has occurred. Instead, your call might take a little longer due to the retries, before succeeding or failing.
When you need to track the number of retry attempts (e.g. for monitoring or debugging) you can use the boto3 event system to call custom functions.
How-To: Tracking boto3 Retries
- Write a function which accepts
attempts
as an argument, and**kwargs
as another, e.g.,handle_retry(attempts: int, **kwargs)
.- If you want to only see retried calls, filter on
attempts > 1
. - If you wish to capture more arguments from
kwargs
, add them as normal arguments to the function.
- If you want to only see retried calls, filter on
- Register your function with the boto3 client or resource instance.
- If you are using a boto3 client then use
myclient.meta.events.register("needs-retry.*", handle_retry)
. - If you are using a boto3 resource then use.
myresource.meta.client.meta.events.register("needs-retry.*", handle_retry)
.
- If you are using a boto3 client then use
- Every time your function is called you can run custom logic, e.g., to emit a retry metric or increment a counter.
Example
import boto3
def increment_metric(name):
print(f"{name}|increment|count=1")
def handle_retry(event_name: str, attempts: int, **kwargs):
if attempts > 1:
increment_metric(event_name)
s3 = boto3.client("s3")
s3.meta.events.register("needs-retry.*", handle_retry)
s3.list_buckets()
This example creates an S3 client, then registers a handle_retry
function. When list_buckets
is called the needs-retry.s3.ListBuckets
event will be fired and handle_retry
will receive it.
If there are no retries, your code will be called with handle_retry(event_name=”needs-retry.s3.ListBuckets”, attempts=1, kwargs={...})
. This in turn calls increment_metric(“needs-retry.s3.ListBuckets”)
.
If there are any retries, your handle_retry
function will be called for each retry, with an ever increasing attempts
argument.
In this example, increment_metric
writes out needs-retry.s3.ListBuckets|increment|count=1
for every event with an attempts > 1
(i.e. on retries). This can be queried later using CloudWatch Logs Insights.
Possible Issues
Different retry mechanisms in boto3 might behave differently. This code has been verified with default configuration on boto3 1.24.75 (and the related botocore 1.27.75). The Legacy Retry Mode is the default for this version.
The potential payloads and arguments event handlers receive is not fully defined, it is possible these might change from one boto3 version to the next.
If you need more per call context than provided by boto3 you need to explore mechanisms for tracking this, e.g. via the contextvars module.
Further Reading
- Exploring boto3 Events With Mitmproxy is a talk given at the Manchester AWS Community Summit 2022 on investigating retries in boto3 using the event system and mitmproxy to debug.
- boto3’s introduction to the event system provides reference details on the event system and how to use it.
- Deep Dive on AWS SDK for Python (Boto3) presentation slides from AWS Dev Day Tokyo 2019, session B-4 gives more details on the event system with more examples.
If you have a question or need help with your AWS projects, don’t hesitate to reach out to us, we’d love to help!