7 min read ·

Shopify webhook retry strategy: how to never lose an event

Shopify drops webhooks after 8 attempts in 4 hours. Here's how to retry across 7 days, what backoff curves work, and how to recover events that are already gone.

When you subscribe a webhook in Shopify, the platform commits to delivering it — but only for a while. After 8 failed attempts within 4 hours, Shopify gives up. The event is gone. There is no admin tool to replay it. That gap is where most production e-commerce stacks lose data.

Shopify's default retry policy

Shopify's current policy (updated in 2024 from a longer 48-hour window): up to 8 retry attempts over 4 hours with exponential backoff, then the subscription is removed. Two things follow:

  • If your endpoint is down for 30 minutes during a deploy, you'll likely catch most retries.
  • If it's down for more than ~4 hours — a long DB migration, a multi-team incident, a forgotten cert renewal — every event in flight is permanently lost.

The hard part: visibility is split. Shopify does send a "your webhook is failing" warning email to your Partner emergency developer email when a subscription is at risk of removal — but that address often goes to a generic admin@ inbox no engineer watches. There's no in-product dashboard counter for "events we gave up on." Most teams discover the loss via the support inbox days later.

Building retry yourself

The instinct is to build a lightweight retry loop. Receive the webhook, queue it, retry on failure with exponential backoff. Sidekiq makes this 30 lines of Ruby.

class WebhookForwardJob
  include Sidekiq::Job
  sidekiq_options retry: 12

  sidekiq_retry_in do |count|
    [60, 300, 900, 1800, 3600, 7200, 14400, 28800,
     57600, 86400, 172800, 345600][count] || 345600
  end

  def perform(event_id)
    event = WebhookEvent.find(event_id)
    response = HTTP.post(event.endpoint_url, body: event.body, headers: event.headers)
    raise RetryableError if response.status >= 500
  end
end

This works for the next outage. The next one after that, you'll want metrics on retry attempts, a dead-letter queue for events that exhausted the curve, an audit log of every replay, and an alert when the DLQ grows. The next one after that, you'll want to verify that no events were missed at all — which means reaching back into Shopify's Admin API and reconciling.

This is the trajectory: a 30-line job becomes a 6-month subsystem. The ceiling on the in-house version is "did our retry curve happen to span the outage."

What a production retry curve looks like

The optimal curve depends on whether your downstream is flapping or genuinely broken:

  • First 5 minutes: 3 attempts at 30s, 60s, 120s. Catches transient TLS errors and brief 502s during deploys.
  • First hour: 4 more attempts. Spaced enough that a routine 30-min outage clears.
  • First day: Hourly, then every 4 hours. Catches longer database migrations or accidental deploy-to-wrong-host events.
  • Days 2–7: Daily. Most outages over a day will get a human's attention before the 7-day mark.

The shape matters because Shopify's curve is extremely compressed — every attempt happens within 4 hours regardless of whether the endpoint will recover. A real-world curve stretches across days, when human intervention becomes possible.

Why retries aren't enough

Even with a perfect retry curve, two failure modes will quietly drop events:

  1. Webhook subscription dies. Once 8 retry attempts fail within the 4-hour window, Shopify removes the subscription. New events for that topic stop arriving entirely — no retries to attempt because they were never sent. Re-registration via the Admin API is required.
  2. Shopify-side bugs. Shopify's webhook delivery has documented incidents where events were delayed or dropped. Status pages exist for a reason.

The fix is to reconcile against the source of truth: the Admin API. Periodically (hourly is plenty for most use cases) fetch the latest orders/products/customers and compare against what you've ingested. Any gaps are events that need to be synthesized and re-delivered. This is the hard part — the part most teams never get around to.

How HookRescue handles all of this

HookRescue sits between Shopify and your existing webhook handler. You change one URL: paste our ingest URL into Shopify's webhook subscription instead of yours. Then:

  • Re-signed forwarding. Every event gets HMAC-SHA256 re-signed with your secret and forwarded to your real endpoint. Your verification code stays exactly the same.
  • 7-day retry curve. The shape above, baked in.
  • Admin API reconciliation. Hourly cross-check against Shopify's GraphQL Admin API for orders/* topics. Missed events are synthesized and replayed.
  • DLQ + replay. Events that exhaust the curve land in a calm dashboard you can review and replay.
  • Alerts. Email or Slack when the DLQ grows or alert thresholds trip.

Setup takes about 3 minutes. Free during private beta — no credit card.

Read next