Skip to content
TraceStax Docs

Worker fleet

The Worker Fleet view gives you a real-time list of every worker process TraceStax has seen recently, along with its current status, the queues it is consuming, and when it was last heard from.

Each row in the fleet table represents a single worker process. The columns are:

ColumnDescription
Worker keyThe stable identifier for this worker process. Defaults to hostname:pid. See Worker key below
HostnameThe machine the worker is running on
PIDThe operating system process ID
QueuesThe queues this worker is configured to consume
ConcurrencyThe maximum number of jobs this worker can process simultaneously
Statusonline, degraded, or offline
Last seenHow long ago the most recent heartbeat arrived

The fleet view is accessible from the Workers tab of your project. By default it shows all workers seen in the last 7 days; use the filter to show only currently online workers.

TraceStax learns about workers through heartbeat events. The SDK sends a heartbeat when the worker process starts up, establishing its presence immediately. It then continues sending heartbeats at a regular interval (default: every 30 seconds, configurable in the SDK).

There is no registration step. As soon as the first heartbeat arrives for a new worker key, TraceStax adds it to the fleet view.

Each heartbeat carries the worker’s current state: its hostname, PID, queues, and concurrency. If a worker’s configuration changes — for example, it is restarted with a different set of queues — the fleet view updates on the next heartbeat.

TraceStax computes a worker’s status by comparing the time since its last heartbeat to the expected heartbeat interval:

StatusCondition
onlineLast heartbeat arrived within 2× the expected interval
degradedLast heartbeat is between 2× and 3× the expected interval (one heartbeat missed)
offlineLast heartbeat is more than 3× the expected interval (two or more heartbeats missed)

The expected interval is set per-SDK (default: 30 seconds). TraceStax learns the interval from the gap between successive heartbeats from the same worker key. For a freshly-seen worker with only one heartbeat on record, TraceStax assumes the SDK default until a second heartbeat confirms the actual interval.

A degraded status is a warning — the worker is probably still running but may be under load, experiencing network issues, or about to crash. An offline status means the worker has almost certainly stopped.

The worker key is the stable identifier TraceStax uses to track a worker process across its lifetime. All task_event and heartbeat events from the same worker process share the same key, which is how TraceStax correlates job executions back to a specific worker instance.

The default key format is hostname:pid. This is unique within a host and changes when the process is restarted (because the PID changes), which is the correct behavior — a restarted worker is a new process.

You can override the worker key in the SDK configuration. Use cases for a custom key:

  • You want a stable key that survives restarts (e.g. a Kubernetes pod name, which remains stable across process crashes but changes when the pod is rescheduled)
  • You want a human-readable name for fleet visibility (e.g. payments-worker-prod-a)

In environments where workers scale down to zero (or near-zero) under low load, TraceStax will mark the scaled-down workers as offline. This is expected behavior and does not indicate a problem — but it can generate unwanted fleet alerts.

To handle autoscaling cleanly:

  • Set an appropriate minimum fleet size — if your queue can legitimately scale to zero workers, set the minimum fleet size for that queue to 0. TraceStax will not alert when all workers are offline.
  • Suppress alerts for specific worker keys — if certain workers are ephemeral (e.g. one-off task runners), you can add their keys to the suppress list in Project Settings → Workers → Suppressed keys. Suppressed workers appear in the fleet view but never fire offline alerts.

Worker keys support wildcard patterns for suppression. For example, batch-runner-* suppresses all workers whose key starts with batch-runner-.

The concurrency value on each fleet row shows how many jobs the worker can process simultaneously, as reported by the framework. TraceStax displays this in the fleet view but does not currently use it as an alert signal — it is informational.

Concurrency is set at worker startup and does not change during the worker’s lifetime unless the worker is restarted. If you change concurrency in your deployment, the fleet view updates after the next heartbeat from the restarted workers.

Fleet alerts fire when the number of online workers consuming a specific queue drops below a configured threshold.

To configure fleet alerts:

  1. Navigate to the Workers tab of your project.
  2. Select Fleet alert rules from the top right.
  3. Click Add rule.
  4. Choose the queue to monitor.
  5. Set the Minimum online workers value.
  6. Choose the routing destination (inherits the project default if not overridden).
  7. Save.

When the online worker count for the queue drops below the minimum, a critical alert fires. When workers come back online and the count recovers, a resolved notification is sent.