Webhook monitoring for Shopify apps: what to track and why
Shopify's webhook delivery dashboard shows the last 7 days. That's not enough to catch slow degradations or audit historical incidents. Here's what production-grade monitoring should include — and the cost of building it yourself.
Symptoms — what monitoring gaps look like
- You discover missing data days after the fact, from a customer complaint.
- The Shopify warning email about a failing webhook is in a Partner-account inbox no engineer reads.
- You can't answer "is the orders/create webhook healthy across all 47 shops we serve?" without manual investigation.
- You don't know whether last week's incident dropped any events, only that it might have.
- Reconciliation runs that should be hourly run weekly because nobody set up the cron.
What Shopify provides out of the box
Apps created in the Dev Dashboard or via Shopify CLI get a basic webhook delivery report:
- Past 7 days of delivery attempts.
- Per-topic breakdown.
- HTTP status codes returned by your endpoint.
- Filterable by shop and topic.
What it doesn't include:
- History beyond 7 days.
- Alerting (you'd have to poll the dashboard yourself).
- DLQ for events that exhausted the retry curve.
- Replay UI for events you want to reprocess.
- Reconciliation against the Admin API for events Shopify never fired.
- HMAC rejection visibility — your endpoint sees these, Shopify doesn't.
- Subscription health summary across multiple shops.
Metrics worth tracking in production
| Metric | What it tells you | Alert threshold |
|---|---|---|
| Delivery success rate (24h) | How often your endpoint accepts a webhook on first try | Below 99.5% |
| Avg attempt count per event | Whether your downstream is flapping | Above 1.3 sustained for 1 hour |
| DLQ size | Events that exhausted retries | Above 0 for more than 1 hour |
| DLQ growth rate | Whether the DLQ is filling faster than you can drain | Above 5/hour |
| HMAC rejection rate | Bad secret config or attempted abuse | Above 0.1% |
| Subscription health | Whether all expected subscriptions are still registered | Any expected topic missing |
| Reconciliation gap rate | Events Shopify never sent (orders/* topics) | Above 0 sustained |
| p95 webhook processing latency | Whether your handler is approaching Shopify's 5s timeout | Above 2s sustained |
| X-Shopify-Api-Version drift | Whether Shopify is falling forward your subscriptions | Any mismatch from subscribed version |
Building it yourself — engineering effort
| Component | Effort | Notes |
|---|---|---|
| Ingest endpoint with HMAC verification + raw body capture | ~1 day | Middleware, raw-body capture, constant-time HMAC compare, unique index on Event-Id |
| Background forwarding with retry | ~1 day | Sidekiq job (or equivalent), exponential backoff curve, response-code handling |
| DLQ for exhausted retries | ~2 days | Schema, dashboard view, audit log of attempts |
| Replay UI | ~3 days | Single-event replay, bulk replay, progress tracking |
| Subscription health monitoring | ~1 day | Daily cron, missing-subscription alert, re-registration flow |
| Hourly reconciliation | ~1 week | Per-topic logic, bulk operation handling, gap synthesis, idempotency, JSONL streaming |
| Alerting system | ~3 days | Threshold alerts, anomaly detection on attempt counts, dedup |
| Secret rotation with grace window | ~2 days | Both-secrets-accepted window, rotation UI, audit |
| Audit log of all actions | ~1 day | Replays, rotations, DLQ resolutions, who did what when |
| API version drift detection | ~1 day | Alert when X-Shopify-Api-Version differs from subscribed version |
Total: roughly 4–6 weeks for a focused engineer. Then ongoing maintenance for every Shopify API version bump, every new topic, every edge case you discover in production. The economics: if your engineering hourly rate is $100+, you'll cover a year of HookRescue's pricing in the first week of building it yourself.
Dedicated webhook reliability tools — three categories
- Generic webhook proxies like Hookdeck and Svix. Provider-agnostic. Strong on retry, DLQ, and replay. Less specialized for Shopify — they don't natively understand
orders/*topics or how to reconcile against Shopify's Admin API. - Webhook test/inspection tools like webhook.site, RequestBin, ngrok. Useful for development. Not designed for production reliability.
- Shopify-specific reliability layers like HookRescue. Optimized for Shopify's retry policy, Admin API reconciliation, subscription health, and the topics that matter for e-commerce apps.
Where HookRescue fits
HookRescue is the Shopify-specific reliability layer:
- 7-day retry curve. Vs Shopify's 4 hours. Survives any reasonable outage on your side.
- Hourly Admin API reconciliation. For
orders/*topics by default. Catches events Shopify never fired or that arrived during a subscription-removal gap. - Auto re-registration. If Shopify removes a subscription, we detect it within an hour and re-register before you find out from a support ticket.
- DLQ + replay UI. Events that exhaust our 7-day curve land in a triage dashboard. One click to replay or resolve.
- Audit log. Every replay, rotation, and resolution timestamped with actor and reason.
- Email + Slack alerts. Configurable thresholds.
- HMAC re-signing. We forward with your secret, so your existing handler verifies our signature exactly the way it verifies Shopify's. Drop-in, zero downstream code changes.
- API version drift detection. We watch
X-Shopify-Api-Versionon every event and alert you if Shopify is silently falling forward your subscriptions.
Setup takes about 3 minutes — paste our ingest URL into your Shopify webhook subscriptions instead of yours. Free during private beta, no card.