Alert routing
When anomaly detection determines that a job or queue has deviated from its baseline, TraceStax fires an alert. Alert routing is the layer that decides where that alert goes — which integration receives it, at what severity, and what happens if nobody acknowledges it.
Alert destinations
Section titled “Alert destinations”TraceStax can route alerts to the following destinations:
| Destination | Type | Notes |
|---|---|---|
| Slack | Messaging | Posts to a channel; supports info, warning, and critical messages |
| Microsoft Teams | Messaging | Posts to a Teams channel via incoming webhook |
| Sends alert emails to configured recipients | ||
| PagerDuty | Incident management | Creates an incident via Events API v2 |
| OpsGenie | Incident management | Creates an alert via OpsGenie Alerts API |
| incident.io | Incident management | Creates or escalates an incident |
| Grafana OnCall | On-call scheduling | Triggers an alert group via the OnCall API |
| Rootly | Incident management | Creates an incident via Rootly API |
| Webhook | Custom HTTP | Posts a signed JSON payload to any HTTPS endpoint |
Each destination must be connected before it can receive alerts. Connect destinations from Project Settings → Integrations.
How routing rules work
Section titled “How routing rules work”Routing operates at two levels:
Project default — every project has a default routing rule that applies to all jobs unless overridden. This is configured in Project Settings → Alerts → Default routing. You choose a destination (or multiple destinations) and a minimum severity that triggers delivery.
Per-job override — individual jobs can override the project default. Open any job from the Jobs list, select Alert settings, and configure a destination specific to that job. This is useful when a payment processing job should page PagerDuty while a lower-priority reporting job should only post to Slack.
When an alert fires, TraceStax evaluates routing in this order:
- Check for a per-job override. If one exists, use it.
- Fall back to the project default.
Severity levels
Section titled “Severity levels”Every alert has a severity of info, warning, or critical. Severity is determined by the alert type and how far the metric has deviated from its baseline:
| Severity | Condition |
|---|---|
info | Informational notices such as plan limit warnings |
warning | Deviation exceeds the configured sensitivity threshold (default: 2σ) |
critical | Deviation exceeds 3σ, OR the condition is a queue stall or worker disappearance |
Queue stalls and worker disappearances are always critical because they represent a complete stoppage rather than a statistical anomaly.
When configuring a routing rule you set a minimum severity for each destination:
- A destination configured for
inforeceivesinfo,warning, andcriticalalerts. - A destination configured for
warningreceives bothwarningandcriticalalerts. - A destination configured for
criticalreceives onlycriticalalerts.
This lets you send all alerts to Slack for visibility while only paging PagerDuty for critical events.
Escalation
Section titled “Escalation”TraceStax supports a secondary destination per routing rule. If the primary destination fails to deliver the alert — or, for incident management integrations, if the alert is not acknowledged within the configured window — TraceStax escalates to the secondary destination.
To configure escalation:
-
Open Project Settings → Alerts → Default routing (or the per-job override for the job you want to configure).
-
Under Primary destination, choose your first destination and severity.
-
Enable Escalation and choose a Secondary destination.
-
Set the Escalation delay — how many minutes TraceStax should wait before escalating. The default is 15 minutes.
-
Save the routing rule.
Escalation triggers under two conditions:
- The primary destination returned a non-2xx HTTP response (delivery failure).
- For PagerDuty, OpsGenie, incident.io, Grafana OnCall, and Rootly: the created incident or alert has not been acknowledged within the escalation delay window.
Deduplication
Section titled “Deduplication”A flapping job — one that repeatedly crosses and recrosses its anomaly threshold — would generate a storm of alerts without deduplication. TraceStax suppresses duplicate alerts using a 5-minute deduplication window.
When an alert fires for a given (job_name, queue, condition) combination, TraceStax opens a deduplication window. Any additional triggers for the same combination within the next 5 minutes are dropped silently. After 5 minutes, if the condition is still active, a single follow-up alert is delivered.
The deduplication window resets when the condition resolves. If a job returns to baseline and then fails again, the second failure generates a new alert.
Alert resolution
Section titled “Alert resolution”When a job returns to its baseline after an alert has fired, TraceStax automatically sends a resolved notification to the same destination(s) that received the original alert. The resolved notification includes:
- The job name and queue
- The condition that was triggered
- The time the alert fired and the time it resolved
- The duration of the anomalous period
Resolution notifications help on-call engineers close incidents without having to manually check whether a job has recovered.
For PagerDuty, OpsGenie, incident.io, Grafana OnCall, and Rootly, TraceStax sends a resolve event via the integration’s API, which automatically closes the open incident or alert.
For Slack and webhook destinations, TraceStax posts a separate resolved message.
Configuring alert routing
Section titled “Configuring alert routing”-
Navigate to your project in the TraceStax dashboard.
-
Open Project Settings from the left sidebar.
-
Select the Alerts tab.
-
Under Default routing, click Edit.
-
Choose a Primary destination from the dropdown. If you have not connected any integrations yet, click Connect integration to add one.
-
Set the Minimum severity for the primary destination.
-
Optionally enable Escalation, choose a secondary destination, and set the escalation delay.
-
Click Save.
To configure a per-job override:
-
Navigate to the Jobs list in your project.
-
Click the job you want to configure.
-
Select Alert settings from the job detail page.
-
Enable Override project default and configure the destination and severity as above.
-
Click Save.