Delivery and Reliability

Webhook Retries on Non-2xx Responses

Last updated:

Most webhook providers treat a non-2xx response as a failed delivery. That simple rule protects against outages, but it can also create duplicate events, retry storms, noisy logs, and confusing production bugs if your endpoint is not designed carefully.

A webhook request is not complete when the provider sends it. It is complete when your endpoint responds in a way the provider accepts.

In most webhook systems, that means your endpoint must return an HTTP status code in the 2xx range. A 200, 201, 202, or 204 response usually tells the sender, "we received this delivery successfully."

A non-2xx response usually says the opposite. It tells the provider that the delivery may have failed, so the provider may try again later.

That is where many production webhook problems begin.

What counts as a failed webhook response?

Each provider has its own retry rules, but the common pattern is simple:

  • 2xx response: delivery accepted
  • 3xx response: often treated as failure unless the provider explicitly follows redirects
  • 4xx response: usually treated as failure, even if the error looks intentional
  • 5xx response: treated as server failure and usually retried
  • timeout: treated as failure because no valid response was received

This means your endpoint can be reachable and still fail delivery. A route that returns 401, 403, 404, 419, 422, or 500 may look like "the server is online," but to the webhook provider it is still a failed delivery.

Why providers retry non-2xx responses

Webhooks are sent over the public internet. Providers must expect temporary network failures, overloaded applications, deploy mistakes, expired certificates, and short outages.

Retrying non-2xx responses gives your system another chance to receive the event after the problem clears.

This is useful when:

  • your server was temporarily overloaded
  • a deployment broke the endpoint for a few minutes
  • a database connection failed during processing
  • a queue worker or cache dependency was unavailable
  • the request timed out before your app responded

The retry behavior is meant to protect your integration. The risk appears when your handler is not safe to run more than once.

The dangerous case: the work happened, but the response failed

The hardest webhook bugs happen when your application partially processes the event, then returns a failure response.

For example:

receive webhook
verify signature
mark invoice as paid
send customer email
call another internal service
internal service fails
return HTTP 500

From your application’s point of view, some work already happened. From the provider’s point of view, the delivery failed because the endpoint returned 500.

The provider may retry the same event later. If your handler is not idempotent, your system may send another email, insert another row, or repeat a business action that should only happen once.

This is why webhook retries and duplicate events are connected problems. You cannot design one safely without thinking about the other.

Should a webhook endpoint ever return non-2xx on purpose?

Yes, but it should be intentional.

Returning non-2xx can make sense when the request is invalid and you want the provider to treat delivery as failed. Examples include:

  • the signature is missing or invalid
  • the payload is malformed
  • the route is not meant to receive that provider’s event
  • the request does not pass a required security check

But do not return non-2xx just because downstream business logic failed after you already accepted the event.

A safer pattern is to store the event, return 2xx, then process the heavier work in a queue where failures can be retried under your control.

A safer pattern for webhook responses

A production webhook endpoint should usually do the smallest amount of work needed before returning a response.

receive webhook
verify signature
store provider event ID
store raw or normalized event data
queue processing job
return HTTP 200 or 204

This gives the provider a fast success response while your application keeps control over the actual processing workflow.

If the queue job fails, your own retry system can handle it. You can add backoff, dead-letter queues, alerts, manual review, or replay tooling without forcing the provider to resend the same delivery blindly.

Why 4xx responses can still cause confusion

Developers sometimes assume 4xx errors are "client errors," so they should not be retried. That may be true in some normal HTTP APIs, but webhook delivery does not always behave like a user-facing API request.

A webhook provider may still record the delivery as failed when it receives a 400, 401, 403, 404, or 422 response.

Common accidental 4xx webhook failures include:

  • CSRF middleware blocking the webhook route
  • auth middleware requiring a logged-in user
  • signature code rejecting the wrong payload format
  • a route path changing during deployment
  • validation rules expecting fields that are not present on every event type

These are not provider problems. They are endpoint behavior problems, and they are exactly the kind of failures that can sit unnoticed if nobody monitors the route.

Why 5xx responses are usually urgent

A 5xx response tells the sender that your server could not complete the request successfully.

In webhook systems, this can point to problems such as:

  • uncaught exceptions in the handler
  • database failures
  • missing environment variables
  • broken service container bindings after deployment
  • queue or cache dependencies failing inside the request
  • timeout limits being reached before the response is sent

If a payment, subscription, or account webhook returns 500, you should treat it as a production incident until you know the blast radius.

How retries can turn one bug into many events

Retries are helpful, but they multiply pressure on a weak endpoint.

Imagine your endpoint starts returning 500 because a database migration broke one column name. The first webhook fails. Then more real events arrive. Then the provider retries the earlier failed events. Your logs now contain a mix of new deliveries and retry attempts.

If the endpoint stays broken long enough, the retry queue grows. After you fix the bug, the provider may send a burst of older events. Without idempotency and ordering safeguards, your application can process stale or duplicated events in the wrong order.

That is why non-2xx handling should not be treated as a small HTTP detail. It affects system behavior during recovery.

What to monitor

For non-2xx webhook failures, monitor both the endpoint and the processing pipeline.

Signal Why it matters
HTTP status code Shows whether the provider would treat delivery as successful or failed.
Timeouts A timeout is often the same as a failed response from the provider’s perspective.
Consecutive failures Helps separate a one-off error from a broken endpoint.
Retry count Shows whether failed events are piling up.
Duplicate event handling Confirms that retried events do not repeat side effects.
Dead-letter queue size Shows whether events are failing even after your own retries.

If you only watch application exceptions, you may miss endpoint-level failures. If you only watch endpoint status, you may miss processing failures after the event is accepted. Reliable webhook systems need both views.

Common fixes for non-2xx webhook problems

  • exclude webhook routes from CSRF middleware when appropriate
  • verify signatures before trusting the payload
  • return 2xx quickly after storing the event
  • move slow work into background jobs
  • store provider event IDs with a unique constraint
  • make processing idempotent
  • track failed jobs and dead-lettered events
  • alert when an endpoint starts returning unexpected status codes

The goal is not to hide real failures by always returning 200. The goal is to separate delivery acknowledgement from internal processing, then monitor both parts clearly.

Where WebhookWatch helps

WebhookWatch monitors your webhook endpoint behavior from the outside.

You configure the webhook URL and the expected HTTP status code range. WebhookWatch sends checks to the endpoint and records whether the response matches what you expect within the timeout window.

That makes it useful for catching broken routes, unexpected 4xx or 5xx responses, deployment mistakes, and timeout issues before they quietly affect real webhook deliveries.

Your internal logs can explain what happened after a real event arrived. WebhookWatch helps answer a more basic production question: is this webhook endpoint still responding correctly right now?

For provider-specific retry behavior, read Stripe webhook retry policy explained and Paddle webhook retry logic explained .

For duplicate delivery protection, see webhook duplicate events and idempotent webhooks in Laravel .

Related guides:

Start monitoring your webhook endpoints →