Skip to content
TraceStax Docs

Alert routing

When anomaly detection determines that a job or queue has deviated from its baseline, TraceStax fires an alert. Alert routing is the layer that decides where that alert goes — which integration receives it, at what severity, and what happens if nobody acknowledges it.

TraceStax can route alerts to the following destinations:

DestinationTypeNotes
SlackMessagingPosts to a channel; supports info, warning, and critical messages
Microsoft TeamsMessagingPosts to a Teams channel via incoming webhook
EmailEmailSends alert emails to configured recipients
PagerDutyIncident managementCreates an incident via Events API v2
OpsGenieIncident managementCreates an alert via OpsGenie Alerts API
incident.ioIncident managementCreates or escalates an incident
Grafana OnCallOn-call schedulingTriggers an alert group via the OnCall API
RootlyIncident managementCreates an incident via Rootly API
WebhookCustom HTTPPosts a signed JSON payload to any HTTPS endpoint

Each destination must be connected before it can receive alerts. Connect destinations from Project Settings → Integrations.

Routing operates at two levels:

Project default — every project has a default routing rule that applies to all jobs unless overridden. This is configured in Project Settings → Alerts → Default routing. You choose a destination (or multiple destinations) and a minimum severity that triggers delivery.

Per-job override — individual jobs can override the project default. Open any job from the Jobs list, select Alert settings, and configure a destination specific to that job. This is useful when a payment processing job should page PagerDuty while a lower-priority reporting job should only post to Slack.

When an alert fires, TraceStax evaluates routing in this order:

  1. Check for a per-job override. If one exists, use it.
  2. Fall back to the project default.

Every alert has a severity of info, warning, or critical. Severity is determined by the alert type and how far the metric has deviated from its baseline:

SeverityCondition
infoInformational notices such as plan limit warnings
warningDeviation exceeds the configured sensitivity threshold (default: 2σ)
criticalDeviation exceeds 3σ, OR the condition is a queue stall or worker disappearance

Queue stalls and worker disappearances are always critical because they represent a complete stoppage rather than a statistical anomaly.

When configuring a routing rule you set a minimum severity for each destination:

  • A destination configured for info receives info, warning, and critical alerts.
  • A destination configured for warning receives both warning and critical alerts.
  • A destination configured for critical receives only critical alerts.

This lets you send all alerts to Slack for visibility while only paging PagerDuty for critical events.

TraceStax supports a secondary destination per routing rule. If the primary destination fails to deliver the alert — or, for incident management integrations, if the alert is not acknowledged within the configured window — TraceStax escalates to the secondary destination.

To configure escalation:

  1. Open Project Settings → Alerts → Default routing (or the per-job override for the job you want to configure).

  2. Under Primary destination, choose your first destination and severity.

  3. Enable Escalation and choose a Secondary destination.

  4. Set the Escalation delay — how many minutes TraceStax should wait before escalating. The default is 15 minutes.

  5. Save the routing rule.

Escalation triggers under two conditions:

  • The primary destination returned a non-2xx HTTP response (delivery failure).
  • For PagerDuty, OpsGenie, incident.io, Grafana OnCall, and Rootly: the created incident or alert has not been acknowledged within the escalation delay window.

A flapping job — one that repeatedly crosses and recrosses its anomaly threshold — would generate a storm of alerts without deduplication. TraceStax suppresses duplicate alerts using a 5-minute deduplication window.

When an alert fires for a given (job_name, queue, condition) combination, TraceStax opens a deduplication window. Any additional triggers for the same combination within the next 5 minutes are dropped silently. After 5 minutes, if the condition is still active, a single follow-up alert is delivered.

The deduplication window resets when the condition resolves. If a job returns to baseline and then fails again, the second failure generates a new alert.

When a job returns to its baseline after an alert has fired, TraceStax automatically sends a resolved notification to the same destination(s) that received the original alert. The resolved notification includes:

  • The job name and queue
  • The condition that was triggered
  • The time the alert fired and the time it resolved
  • The duration of the anomalous period

Resolution notifications help on-call engineers close incidents without having to manually check whether a job has recovered.

For PagerDuty, OpsGenie, incident.io, Grafana OnCall, and Rootly, TraceStax sends a resolve event via the integration’s API, which automatically closes the open incident or alert.

For Slack and webhook destinations, TraceStax posts a separate resolved message.

  1. Navigate to your project in the TraceStax dashboard.

  2. Open Project Settings from the left sidebar.

  3. Select the Alerts tab.

  4. Under Default routing, click Edit.

  5. Choose a Primary destination from the dropdown. If you have not connected any integrations yet, click Connect integration to add one.

  6. Set the Minimum severity for the primary destination.

  7. Optionally enable Escalation, choose a secondary destination, and set the escalation delay.

  8. Click Save.

To configure a per-job override:

  1. Navigate to the Jobs list in your project.

  2. Click the job you want to configure.

  3. Select Alert settings from the job detail page.

  4. Enable Override project default and configure the destination and severity as above.

  5. Click Save.