Event model
Everything TraceStax knows about your background jobs arrives as events sent by the SDK (or directly via the Ingest API). There are three event types, each serving a distinct purpose: task_event, heartbeat, and snapshot.
Event types
Section titled “Event types”task_event
Section titled “task_event”A task_event is emitted each time something notable happens to a job — it starts, succeeds, fails, is retried, stalls, or is revoked. This is the primary signal TraceStax uses for anomaly detection and failure rate tracking.
Fields:
| Field | Type | Required | Description |
|---|---|---|---|
type | "task_event" | Yes | Discriminator field identifying the event type |
framework | string | Yes | The job framework (e.g. celery, bullmq, sidekiq, rq, oban) |
language | string | Yes | The runtime language (e.g. python, node, ruby, elixir) |
sdk_version | string | Yes | The version of the TraceStax SDK in use |
worker.key | string | Yes | Stable identifier for the worker process. Defaults to hostname:pid. Overridable via SDK config |
worker.hostname | string | Yes | Hostname of the machine running the worker |
worker.pid | integer | Yes | Process ID of the worker |
worker.concurrency | integer | Yes | Maximum number of concurrent jobs this worker processes |
worker.queues | string[] | Yes | List of queue names this worker is consuming |
task.name | string | Yes | The fully-qualified job class or function name |
task.id | string | Yes | Unique identifier for this specific job execution |
task.queue | string | Yes | The queue this job was dispatched to |
task.attempt | integer | Yes | Which attempt this is (1 for first attempt, 2 for first retry, etc.) |
task.parent_id | string | No | ID of the parent job, if this job was spawned by another |
task.chain_id | string | No | Identifier grouping all jobs in a workflow chain |
status | string | Yes | One of: started, succeeded, failed, retried, stalled, revoked |
metrics.duration_ms | integer | No | Wall-clock time from start to finish, in milliseconds. Present on succeeded, failed, retried |
metrics.queued_ms | integer | No | Time from job enqueue to job start, in milliseconds. Present when the framework exposes enqueue timestamp |
error.type | string | No | Exception class name. Present on failed and retried |
error.message | string | No | Exception message. Present on failed and retried |
error.stack_trace | string | No | Full stack trace as a single string. Present on failed and retried |
Status values:
| Status | Meaning |
|---|---|
started | The job has been picked up by a worker and begun execution |
succeeded | The job completed without error |
failed | The job raised an unhandled exception and will not be retried (exhausted retries or non-retryable error) |
retried | The job raised an error and has been re-enqueued for another attempt |
stalled | The job was in-progress but the worker stopped reporting; the framework has returned it to the queue |
revoked | The job was cancelled before or during execution |
Example payload:
{ "type": "task_event", "framework": "celery", "language": "python", "sdk_version": "0.4.1", "worker": { "key": "worker-prod-1:14523", "hostname": "worker-prod-1.internal", "pid": 14523, "concurrency": 8, "queues": ["default", "email"] }, "task": { "name": "app.tasks.email.send_welcome_email", "id": "3c8e4f12-7a1b-4d2e-9f3a-0b5c6d7e8f90", "queue": "email", "attempt": 2, "parent_id": null, "chain_id": "a1b2c3d4-onboarding-flow" }, "status": "retried", "metrics": { "duration_ms": 1842, "queued_ms": 312 }, "error": { "type": "SMTPConnectError", "message": "Connection refused to smtp.example.com:587", "stack_trace": "Traceback (most recent call last):\n File \"...\"\nSMTPConnectError: Connection refused" }}heartbeat
Section titled “heartbeat”A heartbeat event is sent periodically by a running worker to confirm it is still alive. The SDK sends a heartbeat on startup and then at a regular interval (default: every 30 seconds). TraceStax uses heartbeats to populate the Worker Fleet view and to fire alerts when expected workers go offline.
Fields:
| Field | Type | Required | Description |
|---|---|---|---|
type | "heartbeat" | Yes | Discriminator field |
framework | string | Yes | The job framework |
worker | object | Yes | The same worker object as in task_event (key, hostname, pid, concurrency, queues) |
timestamp | string | Yes | ISO 8601 timestamp of when the heartbeat was generated |
Example payload:
{ "type": "heartbeat", "framework": "bullmq", "worker": { "key": "api-worker-7:9801", "hostname": "api-worker-7.internal", "pid": 9801, "concurrency": 4, "queues": ["notifications", "webhooks"] }, "timestamp": "2026-03-24T14:22:00.000Z"}snapshot
Section titled “snapshot”A snapshot event is sent by the SDK every 60 seconds and reports the current state of all queues the worker is aware of. Unlike task_event which reflects individual job executions, a snapshot gives TraceStax a point-in-time view of queue depth — how many jobs are waiting, active, and failed — along with a throughput measurement.
Fields:
| Field | Type | Required | Description |
|---|---|---|---|
type | "snapshot" | Yes | Discriminator field |
framework | string | Yes | The job framework |
worker_key | string | Yes | The stable worker key, used to attribute the snapshot to a specific worker process |
queues | array | Yes | One entry per queue the worker monitors (see below) |
timestamp | string | Yes | ISO 8601 timestamp of when the snapshot was taken |
Queue entry fields:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Queue name |
depth | integer | Yes | Number of jobs waiting to be processed |
active | integer | Yes | Number of jobs currently being processed across all workers |
failed | integer | Yes | Number of jobs in the failed/dead-letter state |
throughput_per_min | number | Yes | Jobs completed per minute over the last measurement window, as reported by the framework |
Example payload:
{ "type": "snapshot", "framework": "sidekiq", "worker_key": "sidekiq-prod-3:22041", "queues": [ { "name": "default", "depth": 142, "active": 10, "failed": 3, "throughput_per_min": 47.2 }, { "name": "critical", "depth": 0, "active": 2, "failed": 0, "throughput_per_min": 8.1 } ], "timestamp": "2026-03-24T14:22:00.000Z"}How TraceStax uses each event type
Section titled “How TraceStax uses each event type”| Event type | Used for |
|---|---|
task_event | Anomaly detection (duration, failure rate, throughput), job history, error tracking |
heartbeat | Worker Fleet view, online/offline status, fleet size alerts |
snapshot | Queue depth charts, backlog detection, queue stall detection |
These three signals are complementary. task_event data tells you what happened to individual jobs; snapshot data tells you the state of the queue at a point in time; heartbeat data tells you which workers are alive to process that queue.
Size and batch limits
Section titled “Size and batch limits”| Limit | Value |
|---|---|
| Maximum event size | 64 KB per event |
| Maximum batch size | 100 events per POST to /v1/ingest |
Events larger than 64 KB are rejected with a 413 response. The SDK truncates stack traces and error messages if necessary to stay within this limit. Batches exceeding 100 events are also rejected; the SDK automatically splits large batches into multiple requests.