How it works
The event model
Section titled “The event model”TraceStax is event-driven. Your SDK emits a task_event for each notable state transition in a job’s lifecycle:
| Status | When it fires |
|---|---|
started | Immediately when the worker picks up the job |
succeeded | When the job completes without raising an exception |
failed | When the job raises an unhandled exception |
retried | When a failed job is retried |
stalled | When a job stops making progress |
timeout | When a job exceeds its time limit |
Each event carries: job name, queue name, worker.key, duration, retry count, optional error details (including stack traces), and a timestamp. No payload data or PII is collected.
The ingest pipeline
Section titled “The ingest pipeline”Events are sent via HTTPS to ingest.tracestax.com. The pipeline is built on Cloudflare Workers and processes events at the edge closest to your worker fleet, globally — typical ingest latency is under 30ms.
Events are written to a time-series store scoped to your project. Your API key authenticates the connection; all traffic is TLS 1.3.
Anomaly detection
Section titled “Anomaly detection”TraceStax builds a rolling statistical baseline for each (job_name, queue) pair — tracking median duration, p95 duration, and failure rate over a configurable lookback window (default: 7 days).
When a new event deviates significantly from the baseline, an alert fires. This means:
- No manual thresholds to configure
- No alert fatigue from static rules that go stale
- Automatic adaptation as your workload changes
See Anomaly detection for the full algorithm.
Worker fleet tracking
Section titled “Worker fleet tracking”Every event includes a worker.key — a stable identifier for the process that ran the job. TraceStax uses this to build a real-time view of your fleet: which workers are active, their last-seen time, and their concurrency utilisation.
Alert routing
Section titled “Alert routing”When an anomaly is detected, TraceStax calls your configured on-call webhook — PagerDuty, OpsGenie, Slack, or any of the supported integrations. Alerts de-duplicate and auto-resolve when the job behaviour returns to baseline.
Troubleshooting
Section titled “Troubleshooting”No events appearing in the dashboard
- Check your API key is correct and hasn’t been rotated
- Confirm your worker process can reach
ingest.tracestax.comover HTTPS (port 443) - Look for SDK errors in your worker logs — the SDK logs a warning if the ingest call fails
- Check the project event limit hasn’t been reached (visible in Settings → Usage)
Events appearing but no alerts firing
- Anomaly detection requires a baseline to be established — this typically takes 24–48 hours of event data
- Check your alert routing configuration in Settings → Alerts
- Verify your on-call integration is connected (test button available in the integration settings)