Skip to content
TraceStax Docs

How it works

TraceStax is event-driven. Your SDK emits a task_event for each notable state transition in a job’s lifecycle:

StatusWhen it fires
startedImmediately when the worker picks up the job
succeededWhen the job completes without raising an exception
failedWhen the job raises an unhandled exception
retriedWhen a failed job is retried
stalledWhen a job stops making progress
timeoutWhen a job exceeds its time limit

Each event carries: job name, queue name, worker.key, duration, retry count, optional error details (including stack traces), and a timestamp. No payload data or PII is collected.

Events are sent via HTTPS to ingest.tracestax.com. The pipeline is built on Cloudflare Workers and processes events at the edge closest to your worker fleet, globally — typical ingest latency is under 30ms.

Events are written to a time-series store scoped to your project. Your API key authenticates the connection; all traffic is TLS 1.3.

TraceStax builds a rolling statistical baseline for each (job_name, queue) pair — tracking median duration, p95 duration, and failure rate over a configurable lookback window (default: 7 days).

When a new event deviates significantly from the baseline, an alert fires. This means:

  • No manual thresholds to configure
  • No alert fatigue from static rules that go stale
  • Automatic adaptation as your workload changes

See Anomaly detection for the full algorithm.

Every event includes a worker.key — a stable identifier for the process that ran the job. TraceStax uses this to build a real-time view of your fleet: which workers are active, their last-seen time, and their concurrency utilisation.

When an anomaly is detected, TraceStax calls your configured on-call webhook — PagerDuty, OpsGenie, Slack, or any of the supported integrations. Alerts de-duplicate and auto-resolve when the job behaviour returns to baseline.

No events appearing in the dashboard

  • Check your API key is correct and hasn’t been rotated
  • Confirm your worker process can reach ingest.tracestax.com over HTTPS (port 443)
  • Look for SDK errors in your worker logs — the SDK logs a warning if the ingest call fails
  • Check the project event limit hasn’t been reached (visible in Settings → Usage)

Events appearing but no alerts firing

  • Anomaly detection requires a baseline to be established — this typically takes 24–48 hours of event data
  • Check your alert routing configuration in Settings → Alerts
  • Verify your on-call integration is connected (test button available in the integration settings)