TraceStax Docs

Alert routing

When anomaly detection determines that a job or queue has deviated from its baseline, TraceStax fires an alert. Alert routing is the layer that decides where that alert goes — which integration receives it, at what severity, and what happens if nobody acknowledges it.

Alert destinations

TraceStax can route alerts to the following destinations:

Destination	Type	Notes
Slack	Messaging	Posts to a channel; supports info, warning, and critical messages
Microsoft Teams	Messaging	Posts to a Teams channel via incoming webhook
Email	Email	Sends alert emails to configured recipients
PagerDuty	Incident management	Creates an incident via Events API v2
OpsGenie	Incident management	Creates an alert via OpsGenie Alerts API
incident.io	Incident management	Creates or escalates an incident
Grafana OnCall	On-call scheduling	Triggers an alert group via the OnCall API
Rootly	Incident management	Creates an incident via Rootly API
Webhook	Custom HTTP	Posts a signed JSON payload to any HTTPS endpoint

Each destination must be connected before it can receive alerts. Connect destinations from Project Settings → Integrations.

How routing rules work

Routing operates at two levels:

Project default — every project has a default routing rule that applies to all jobs unless overridden. This is configured in Project Settings → Alerts → Default routing. You choose a destination (or multiple destinations) and a minimum severity that triggers delivery.

Per-job override — individual jobs can override the project default. Open any job from the Jobs list, select Alert settings, and configure a destination specific to that job. This is useful when a payment processing job should page PagerDuty while a lower-priority reporting job should only post to Slack.

When an alert fires, TraceStax evaluates routing in this order:

Check for a per-job override. If one exists, use it.
Fall back to the project default.

Severity levels

Every alert has a severity of info, warning, or critical. Severity is determined by the alert type and how far the metric has deviated from its baseline:

Severity	Condition
`info`	Informational notices such as plan limit warnings
`warning`	Deviation exceeds the configured sensitivity threshold (default: 2σ)
`critical`	Deviation exceeds 3σ, OR the condition is a queue stall or worker disappearance

Queue stalls and worker disappearances are always critical because they represent a complete stoppage rather than a statistical anomaly.

When configuring a routing rule you set a minimum severity for each destination:

A destination configured for info receives info, warning, and critical alerts.
A destination configured for warning receives both warning and critical alerts.
A destination configured for critical receives only critical alerts.

This lets you send all alerts to Slack for visibility while only paging PagerDuty for critical events.

Escalation

TraceStax supports a secondary destination per routing rule. If the primary destination fails to deliver the alert — or, for incident management integrations, if the alert is not acknowledged within the configured window — TraceStax escalates to the secondary destination.

To configure escalation:

Open Project Settings → Alerts → Default routing (or the per-job override for the job you want to configure).
Under Primary destination, choose your first destination and severity.
Enable Escalation and choose a Secondary destination.
Set the Escalation delay — how many minutes TraceStax should wait before escalating. The default is 15 minutes.
Save the routing rule.

Escalation triggers under two conditions:

The primary destination returned a non-2xx HTTP response (delivery failure).
For PagerDuty, OpsGenie, incident.io, Grafana OnCall, and Rootly: the created incident or alert has not been acknowledged within the escalation delay window.

Deduplication

A flapping job — one that repeatedly crosses and recrosses its anomaly threshold — would generate a storm of alerts without deduplication. TraceStax suppresses duplicate alerts using a 5-minute deduplication window.

When an alert fires for a given (job_name, queue, condition) combination, TraceStax opens a deduplication window. Any additional triggers for the same combination within the next 5 minutes are dropped silently. After 5 minutes, if the condition is still active, a single follow-up alert is delivered.

The deduplication window resets when the condition resolves. If a job returns to baseline and then fails again, the second failure generates a new alert.

Alert resolution

When a job returns to its baseline after an alert has fired, TraceStax automatically sends a resolved notification to the same destination(s) that received the original alert. The resolved notification includes:

The job name and queue
The condition that was triggered
The time the alert fired and the time it resolved
The duration of the anomalous period

Resolution notifications help on-call engineers close incidents without having to manually check whether a job has recovered.

For PagerDuty, OpsGenie, incident.io, Grafana OnCall, and Rootly, TraceStax sends a resolve event via the integration’s API, which automatically closes the open incident or alert.

For Slack and webhook destinations, TraceStax posts a separate resolved message.

Configuring alert routing

Navigate to your project in the TraceStax dashboard.
Open Project Settings from the left sidebar.
Select the Alerts tab.
Under Default routing, click Edit.
Choose a Primary destination from the dropdown. If you have not connected any integrations yet, click Connect integration to add one.
Set the Minimum severity for the primary destination.
Optionally enable Escalation, choose a secondary destination, and set the escalation delay.
Click Save.

To configure a per-job override:

Navigate to the Jobs list in your project.
Click the job you want to configure.
Select Alert settings from the job detail page.
Enable Override project default and configure the destination and severity as above.
Click Save.