TraceStax Docs

Worker fleet

The Worker Fleet view gives you a real-time list of every worker process TraceStax has seen recently, along with its current status, the queues it is consuming, and when it was last heard from.

What the fleet view shows

Each row in the fleet table represents a single worker process. The columns are:

Column	Description
Worker key	The stable identifier for this worker process. Defaults to `hostname:pid`. See Worker key below
Hostname	The machine the worker is running on
PID	The operating system process ID
Queues	The queues this worker is configured to consume
Concurrency	The maximum number of jobs this worker can process simultaneously
Status	`online`, `degraded`, or `offline`
Last seen	How long ago the most recent heartbeat arrived

The fleet view is accessible from the Workers tab of your project. By default it shows all workers seen in the last 7 days; use the filter to show only currently online workers.

Worker discovery

TraceStax learns about workers through heartbeat events. The SDK sends a heartbeat when the worker process starts up, establishing its presence immediately. It then continues sending heartbeats at a regular interval (default: every 30 seconds, configurable in the SDK).

There is no registration step. As soon as the first heartbeat arrives for a new worker key, TraceStax adds it to the fleet view.

Each heartbeat carries the worker’s current state: its hostname, PID, queues, and concurrency. If a worker’s configuration changes — for example, it is restarted with a different set of queues — the fleet view updates on the next heartbeat.

Worker status

TraceStax computes a worker’s status by comparing the time since its last heartbeat to the expected heartbeat interval:

Status	Condition
`online`	Last heartbeat arrived within 2× the expected interval
`degraded`	Last heartbeat is between 2× and 3× the expected interval (one heartbeat missed)
`offline`	Last heartbeat is more than 3× the expected interval (two or more heartbeats missed)

The expected interval is set per-SDK (default: 30 seconds). TraceStax learns the interval from the gap between successive heartbeats from the same worker key. For a freshly-seen worker with only one heartbeat on record, TraceStax assumes the SDK default until a second heartbeat confirms the actual interval.

A degraded status is a warning — the worker is probably still running but may be under load, experiencing network issues, or about to crash. An offline status means the worker has almost certainly stopped.

Worker key

The worker key is the stable identifier TraceStax uses to track a worker process across its lifetime. All task_event and heartbeat events from the same worker process share the same key, which is how TraceStax correlates job executions back to a specific worker instance.

The default key format is hostname:pid. This is unique within a host and changes when the process is restarted (because the PID changes), which is the correct behavior — a restarted worker is a new process.

You can override the worker key in the SDK configuration. Use cases for a custom key:

You want a stable key that survives restarts (e.g. a Kubernetes pod name, which remains stable across process crashes but changes when the pod is rescheduled)
You want a human-readable name for fleet visibility (e.g. payments-worker-prod-a)

Autoscaling environments

In environments where workers scale down to zero (or near-zero) under low load, TraceStax will mark the scaled-down workers as offline. This is expected behavior and does not indicate a problem — but it can generate unwanted fleet alerts.

To handle autoscaling cleanly:

Set an appropriate minimum fleet size — if your queue can legitimately scale to zero workers, set the minimum fleet size for that queue to 0. TraceStax will not alert when all workers are offline.
Suppress alerts for specific worker keys — if certain workers are ephemeral (e.g. one-off task runners), you can add their keys to the suppress list in Project Settings → Workers → Suppressed keys. Suppressed workers appear in the fleet view but never fire offline alerts.

Worker keys support wildcard patterns for suppression. For example, batch-runner-* suppresses all workers whose key starts with batch-runner-.

Concurrency

The concurrency value on each fleet row shows how many jobs the worker can process simultaneously, as reported by the framework. TraceStax displays this in the fleet view but does not currently use it as an alert signal — it is informational.

Concurrency is set at worker startup and does not change during the worker’s lifetime unless the worker is restarted. If you change concurrency in your deployment, the fleet view updates after the next heartbeat from the restarted workers.

Fleet alerts

Fleet alerts fire when the number of online workers consuming a specific queue drops below a configured threshold.

To configure fleet alerts:

Navigate to the Workers tab of your project.
Select Fleet alert rules from the top right.
Click Add rule.
Choose the queue to monitor.
Set the Minimum online workers value.
Choose the routing destination (inherits the project default if not overridden).
Save.

When the online worker count for the queue drops below the minimum, a critical alert fires. When workers come back online and the count recovers, a resolved notification is sent.