Polite Retry: How to Stop Your Node.js Retries from Taking Down Your Own Backend

Table of contents
  1. The Problem: Retry Amplification
  2. The Three Things You Actually Need
  3. 1. Basic Retry — and Why Jitter Matters
  4. 2. Circuit Breaker — Knowing When to Stop
  5. 3. Adaptive Retry Budgeting — The Real Fix
  6. 4. Backpressure: Letting Servers Tell Clients to Slow Down
  7. Server side (Express)
  8. Client side
  9. How This Fits a Real Node.js Service
  10. Things to Get Right (and Wrong) in Production
  11. Wrapping Up
Polite Retry: How to Stop Your Node.js Retries from Taking Down Your Own Backend

Every Node.js engineer has written this code at least once:

async function callService() {
  for (let i = 0; i < 3; i++) {
    try {
      return await fetch(url);
    } catch (e) {
      // try again!
    }
  }
  throw new Error('failed');
}

It looks defensive. It looks resilient. And during a normal day, it is. The problem is that on the worst day — the day a downstream service is already on fire — this exact pattern is what finishes the job. Three retries from every client, multiplied across a fleet, multiplied across tiers of services, is how a brief blip turns into an hour-long outage.

This post walks through why naive retries are so dangerous, the formal mechanism behind that danger (retry amplification), and how polite-retry — a small, zero-dependency TypeScript library — applies the academic research on the subject to give you retries that are actually safe to run in production.


The Problem: Retry Amplification

Imagine a normal request path through three tiers of services: an API gateway calls a business-logic service, which calls a data service. On a healthy day, 100 requests in equals 100 responses out. Easy.

Now suppose the data tier starts failing 50% of its requests — maybe a deploy, maybe a bad node, maybe a noisy neighbour. Each tier above it has been configured the "obvious" way: retry up to 3 times on failure.

Watch what happens to the request volume hitting the struggling service:

  • The middle tier sees 50% failures, so for every failed call it tries up to 3 more times. The data service's load roughly doubles.
  • The gateway sees the middle tier failing, so it retries too, multiplying the load again.
  • More load means more failures. More failures mean more retries.

In a 3-tier system with a 50% underlying failure rate and 3 retries per tier, the total request volume hitting the bottom service can be 6.6× normal load. The retries didn't help the system recover — they pushed it from "degraded" into "completely down." This is the cascade collapse pattern that takes whole platforms offline, and it has a name: retry amplification.

retry_amplification.gif

The research paper this library is based on (linked above) walks through the math formally and analyses how different retry policies — fixed delay, exponential backoff, jittered backoff, budgeted retries — perform under these conditions. The TL;DR from the paper is the design philosophy of the library: retries must be aware of how the system as a whole is doing, not just whether one individual call succeeded.

The Three Things You Actually Need

Most retry libraries on npm give you exponential backoff and call it a day. That's the easy 80%. The hard 20% — the part that prevents amplification — needs three additional ideas working together:

Jitter, so retries don't synchronise into periodic spikes.

Circuit breaking, so when a service is clearly down, you stop hammering it.

Retry budgeting, so the aggregate retry traffic is capped relative to baseline load.

polite-retry exposes these as three composable strategies, with progressively stronger guarantees:

StrategyUse CaseAmplification Risk
retry()Simple retries with backoff and jitterMedium
retryWithCircuitBreaker()Stop retrying when the service is clearly downLow
retryWithBudget()Adaptive Retry Budgeting (recommended for prod)Very Low
retryWithProtection()Combined budget + circuit breaker for critical pathsVery Low

Let's walk through each.

1. Basic Retry — and Why Jitter Matters

The basic retry() function looks like what you'd expect, but the default jitter: 'full' is doing real work:

import { retry } from 'polite-retry';

const data = await retry(
  async () => {
    const response = await fetch('https://api.example.com/data');
    if (!response.ok) throw new Error(`HTTP ${response.status}`);
    return response.json();
  },
  {
    maxRetries: 3,
    initialDelayMs: 100,
    backoffMultiplier: 2,
    jitter: 'full',
    timeoutMs: 5000,
    retryIf: (error) => !error.message.includes('4'), // don't retry 4xx
    onRetry: (err, attempt, delay) => {
      console.log(`Retry ${attempt} in ${delay}ms: ${err.message}`);
    },
  },
);

Why jitter matters: imagine 1,000 clients all hit a flaky service at the same moment, all fail, and all back off "exponentially" — say, 100ms then 200ms then 400ms. Without jitter, all 1,000 clients retry at exactly the 100ms mark, then exactly the 200ms mark. The struggling service sees a precise periodic stampede instead of a smooth distribution of load. It never gets a quiet moment to recover.

polite-retry ships four jitter strategies straight out of the AWS Architecture Blog's classic on the subject:

StrategyFormulaUse case
nonedelayTesting only — never production
fullrandom(0, delay)General-purpose default
equaldelay/2 + random(0, delay/2)When you want a guaranteed minimum
decorrelatedrandom(base, prev * 3)Long correlated retry sequences

The single biggest improvement most Node services can make to their resilience profile is changing jitter: 'none' (or no jitter at all) to jitter: 'full'. It costs nothing.

2. Circuit Breaker — Knowing When to Stop

Retries assume the failure is transient. When it isn't — when a downstream is genuinely down — you don't want every request to spend 5 seconds going through 3 retries before giving up. That's just wasted compute, wasted connections, and wasted user time.

The circuit breaker pattern fixes this. Track failures in a sliding window. If the failure rate crosses a threshold, open the circuit — fail fast for a cooldown period without even attempting the call. After the cooldown, transition to half-open and let one test request through. If it succeeds, close the circuit and resume normal traffic. If it fails, open again.

import { retryWithCircuitBreaker, CircuitBreaker } from 'polite-retry';

const paymentBreaker = new CircuitBreaker({
  failureThreshold: 0.5,    // open at 50% failure rate
  windowSize: 10,           // over the last 10 requests
  resetTimeoutMs: 30_000,   // try again 30s after opening
  onStateChange: (state) => log.info(`payment circuit: ${state}`),
});

const result = await retryWithCircuitBreaker(
  () => chargePayment(amount),
  paymentBreaker,
  { maxRetries: 3 },
);

The key rule: one breaker per downstream service, shared across all the call sites in your process that talk to that service. A breaker per individual request gives you nothing.

3. Adaptive Retry Budgeting — The Real Fix

This is the recommended strategy for production microservices, and it's the one that directly maps to the paper's analysis.

The core idea: instead of letting every request retry up to 3 times, cap the total retry traffic as a fraction of normal traffic. If you only allow a 20% retry budget, then for every 100 original requests you can issue at most 20 retries — regardless of how many of them are failing. The math means amplification is bounded by 1 + budget (so 1.2× in this case), no matter how bad things get downstream.

The "adaptive" part is what makes this practical: a static 20% budget is fine when failure rates are low, but a smart system should shrink the budget when failures spike (because retrying a sick service is counterproductive) and restore it when things calm down.

import { retryWithBudget, AdaptiveRetryBudget } from 'polite-retry';

// One budget instance per downstream service, shared across the process
const paymentBudget = new AdaptiveRetryBudget({
  initialBudget: 0.2,           // 20% retry overhead allowed
  highFailureThreshold: 0.3,    // shrink budget when >30% failing
  lowFailureThreshold: 0.05,    // restore budget when <5% failing
  budgetDecreaseRate: 0.5,      // halve budget on shrink
  budgetIncreaseRate: 0.1,      // grow 10% on restore
  adjustmentIntervalMs: 1000,
  onBudgetChange: (budget, rate) => {
    metrics.gauge('retry.budget', budget);
    metrics.gauge('retry.failure_rate', rate);
  },
});

const data = await retryWithBudget(
  () => fetchFromPaymentService(),
  paymentBudget,
  { maxRetries: 3, jitter: 'full' },
);

// Don't forget cleanup
process.on('SIGTERM', () => paymentBudget.dispose());

The behaviour table is intuitive:

Observed failure rateBudget action
< 5%Grow budget, up to the configured maximum
5–30%Hold steady
> 30%Cut budget by 50%
Backpressure signal receivedStop retrying immediately

You can also pull metrics out of the budget for observability:

const m = paymentBudget.getMetrics();
// {
//   totalRequests: 150,
//   successfulRequests: 140,
//   failedRequests: 10,
//   totalRetries: 15,
//   failureRate: 0.08,
//   retryAmplificationFactor: 1.11
// }

The retryAmplificationFactor is the key SLO to alert on. If it's drifting above 1.5, your retry policy is starting to add meaningful load instead of absorbing transient failures.

4. Backpressure: Letting Servers Tell Clients to Slow Down

Even the smartest client-side policy can't beat server-side knowledge. The server knows its own load, queue depth, and latency tail. The most polite thing a client can do is listen when the server says "I'm overloaded."

polite-retry ships both halves of this — a server-side Express middleware that automatically annotates responses with load information, and a client-side manager that reads those annotations and feeds them back into the retry budget.

Server side (Express)

import express from 'express';
import { RequestCounter, createBackpressureMiddleware } from 'polite-retry';

const app = express();
const counter = new RequestCounter();

app.use(counter.middleware()); // tracks active requests automatically

app.use(createBackpressureMiddleware({
  getLoadLevel: () => counter.getCount() / 100, // 100 = max concurrent
  overloadThreshold: 0.8,
}));

Every response now carries:

  • X-Backpressure: 0.75 — current load level (0.0 to 1.0)
  • X-Load-Shedding: true — set when over threshold
  • Retry-After: 5 — suggested wait in seconds

Client side

import {
  retryWithBudget,
  AdaptiveRetryBudget,
  BackpressureManager,
} from 'polite-retry';

const backpressure = new BackpressureManager();

const budget = new AdaptiveRetryBudget({
  checkBackpressure: () => backpressure.isOverloaded('payment-service'),
});

async function callPaymentService(payload) {
  return retryWithBudget(
    async () => {
      const res = await fetch('https://payment-service/charge', {
        method: 'POST',
        body: JSON.stringify(payload),
      });
      backpressure.recordFromHeaders('payment-service', res.headers);
      if (!res.ok) throw new Error(`HTTP ${res.status}`);
      return res.json();
    },
    budget,
    { maxRetries: 3 },
  );
}

The same primitives work for gRPC via metadata instead of HTTP headers — examples in the docs.

How This Fits a Real Node.js Service

A typical Node.js backend has a handful of distinct downstream dependencies — a database, a payment provider, a notification service, an internal user service, a third-party auth API. Each one has different latency profiles, different reliability characteristics, and different "this is on fire" signals. The pattern that works:

One module per downstream, exporting a wrapped client. Inside that module, instantiate a single AdaptiveRetryBudget (and optionally a CircuitBreaker) at module scope. Every function in that module routes its calls through retryWithBudget (or retryWithProtection) using the shared instances.

Tune per-service. Payment APIs deserve a tighter budget (10%) and stricter circuit thresholds — false positives are cheap, retry storms during checkout are catastrophic. Internal best-effort calls (analytics, telemetry) can run looser budgets (30%) and more retries.

Wire the metrics. Send failureRate, retryAmplificationFactor, and circuit state to whatever monitoring stack you use (Datadog, Prometheus, CloudWatch). Alert on amplification > 1.5 and on circuits stuck open.

Add backpressure middleware to your own services. Your service is downstream to *something*. Adding the middleware costs you essentially nothing and gives every caller — whether they use polite-retry or not — actionable signals.

Clean up on shutdown. AdaptiveRetryBudget runs an internal interval timer for adjustment cycles. Call .dispose() on SIGTERM, otherwise your process will hang.

A skeleton looks like this:

// src/clients/payment.ts
import {
  retryWithProtection,
  CircuitBreaker,
  AdaptiveRetryBudget,
} from 'polite-retry';

const breaker = new CircuitBreaker({
  failureThreshold: 0.4,
  windowSize: 20,
  resetTimeoutMs: 30_000,
});

const budget = new AdaptiveRetryBudget({
  initialBudget: 0.1, // tighter for payments
  highFailureThreshold: 0.2,
});

export async function chargeCard(amount: number, token: string) {
  return retryWithProtection(
    () => doChargeRequest(amount, token),
    { circuitBreaker: breaker, budget },
    {
      maxRetries: 2,
      jitter: 'full',
      timeoutMs: 4000,
      retryIf: (err) => !/^4\d\d/.test(err.message), // never retry 4xx
    },
  );
}

export function disposePaymentClient() {
  budget.dispose();
}
// src/server.ts
import { disposePaymentClient } from './clients/payment';

process.on('SIGTERM', () => {
  disposePaymentClient();
  // ...other cleanup
});

Things to Get Right (and Wrong) in Production

A short, opinionated checklist of what experience — and the paper — say to actually do:

Do

  • Use jitter: 'full' everywhere. Always.
  • Cap retries at 3. More than that almost never converts a failure into a success and almost always inflates load.
  • Use a per-downstream shared budget, not one budget per request.
  • Set per-attempt timeoutMs. A retry policy without timeouts isn't resilience, it's a slow leak.
  • Be selective with retryIf: 4xx errors are not the server's fault and won't get better on a second try.
  • Alert on retryAmplificationFactor > 1.5.

Don't

  • Don't retry without backoff. Immediate retries are how you turn a 50ms blip into a 5-second outage.
  • Don't ignore Retry-After. The server told you when to come back; come back then.
  • Don't construct a new budget per request. The whole point is to share the count across calls.
  • Don't silently swallow onRetry callbacks. Log them, even at debug level — when the postmortem happens you'll want them.

Wrapping Up

Retries are one of those topics where everyone knows roughly what to do, the official advice is fine, and yet production systems still fall over for retry-amplification reasons multiple times a year at almost every company. The gap is that "exponential backoff with jitter" — what most retry libraries give you — is necessary but not sufficient. You also need to bound the aggregate retry traffic, listen to the server, and stop retrying entirely when the downstream is clearly cooked.

polite-retry packages those ideas into a small TypeScript library with no runtime dependencies, with the algorithms calibrated against the analysis in the paper. If you've got a Node.js service that talks to anything else over the network — and that's basically all of them — it's worth replacing your hand-rolled for loop with this.

npm install polite-retry

MIT licensed. Issues, PRs, and benchmarks welcome.

© 2026 Rishabh Mehan · All rights reserved · Built with Next.js and a little stubbornness.