TraceStax Docs

How it works

The event model

TraceStax is event-driven. Your SDK emits a task_event for each notable state transition in a job’s lifecycle:

Status	When it fires
`started`	Immediately when the worker picks up the job
`succeeded`	When the job completes without raising an exception
`failed`	When the job raises an unhandled exception
`retried`	When a failed job is retried
`stalled`	When a job stops making progress
`timeout`	When a job exceeds its time limit

Each event carries: job name, queue name, worker.key, duration, retry count, optional error details (including stack traces), and a timestamp. No payload data or PII is collected.

The ingest pipeline

Events are sent via HTTPS to ingest.tracestax.com. The pipeline is built on Cloudflare Workers and processes events at the edge closest to your worker fleet, globally — typical ingest latency is under 30ms.

Events are written to a time-series store scoped to your project. Your API key authenticates the connection; all traffic is TLS 1.3.

Anomaly detection

TraceStax builds a rolling statistical baseline for each (job_name, queue) pair — tracking median duration, p95 duration, and failure rate over a configurable lookback window (default: 7 days).

When a new event deviates significantly from the baseline, an alert fires. This means:

No manual thresholds to configure
No alert fatigue from static rules that go stale
Automatic adaptation as your workload changes

See Anomaly detection for the full algorithm.

Worker fleet tracking

Every event includes a worker.key — a stable identifier for the process that ran the job. TraceStax uses this to build a real-time view of your fleet: which workers are active, their last-seen time, and their concurrency utilisation.

Alert routing

When an anomaly is detected, TraceStax calls your configured on-call webhook — PagerDuty, OpsGenie, Slack, or any of the supported integrations. Alerts de-duplicate and auto-resolve when the job behaviour returns to baseline.

Troubleshooting

No events appearing in the dashboard

Check your API key is correct and hasn’t been rotated
Confirm your worker process can reach ingest.tracestax.com over HTTPS (port 443)
Look for SDK errors in your worker logs — the SDK logs a warning if the ingest call fails
Check the project event limit hasn’t been reached (visible in Settings → Usage)

Events appearing but no alerts firing

Anomaly detection requires a baseline to be established — this typically takes 24–48 hours of event data
Check your alert routing configuration in Settings → Alerts
Verify your on-call integration is connected (test button available in the integration settings)