Skip to content
TraceStax Docs

Event model

Everything TraceStax knows about your background jobs arrives as events sent by the SDK (or directly via the Ingest API). There are three event types, each serving a distinct purpose: task_event, heartbeat, and snapshot.

A task_event is emitted each time something notable happens to a job — it starts, succeeds, fails, is retried, stalls, or is revoked. This is the primary signal TraceStax uses for anomaly detection and failure rate tracking.

Fields:

FieldTypeRequiredDescription
type"task_event"YesDiscriminator field identifying the event type
frameworkstringYesThe job framework (e.g. celery, bullmq, sidekiq, rq, oban)
languagestringYesThe runtime language (e.g. python, node, ruby, elixir)
sdk_versionstringYesThe version of the TraceStax SDK in use
worker.keystringYesStable identifier for the worker process. Defaults to hostname:pid. Overridable via SDK config
worker.hostnamestringYesHostname of the machine running the worker
worker.pidintegerYesProcess ID of the worker
worker.concurrencyintegerYesMaximum number of concurrent jobs this worker processes
worker.queuesstring[]YesList of queue names this worker is consuming
task.namestringYesThe fully-qualified job class or function name
task.idstringYesUnique identifier for this specific job execution
task.queuestringYesThe queue this job was dispatched to
task.attemptintegerYesWhich attempt this is (1 for first attempt, 2 for first retry, etc.)
task.parent_idstringNoID of the parent job, if this job was spawned by another
task.chain_idstringNoIdentifier grouping all jobs in a workflow chain
statusstringYesOne of: started, succeeded, failed, retried, stalled, revoked
metrics.duration_msintegerNoWall-clock time from start to finish, in milliseconds. Present on succeeded, failed, retried
metrics.queued_msintegerNoTime from job enqueue to job start, in milliseconds. Present when the framework exposes enqueue timestamp
error.typestringNoException class name. Present on failed and retried
error.messagestringNoException message. Present on failed and retried
error.stack_tracestringNoFull stack trace as a single string. Present on failed and retried

Status values:

StatusMeaning
startedThe job has been picked up by a worker and begun execution
succeededThe job completed without error
failedThe job raised an unhandled exception and will not be retried (exhausted retries or non-retryable error)
retriedThe job raised an error and has been re-enqueued for another attempt
stalledThe job was in-progress but the worker stopped reporting; the framework has returned it to the queue
revokedThe job was cancelled before or during execution

Example payload:

{
"type": "task_event",
"framework": "celery",
"language": "python",
"sdk_version": "0.4.1",
"worker": {
"key": "worker-prod-1:14523",
"hostname": "worker-prod-1.internal",
"pid": 14523,
"concurrency": 8,
"queues": ["default", "email"]
},
"task": {
"name": "app.tasks.email.send_welcome_email",
"id": "3c8e4f12-7a1b-4d2e-9f3a-0b5c6d7e8f90",
"queue": "email",
"attempt": 2,
"parent_id": null,
"chain_id": "a1b2c3d4-onboarding-flow"
},
"status": "retried",
"metrics": {
"duration_ms": 1842,
"queued_ms": 312
},
"error": {
"type": "SMTPConnectError",
"message": "Connection refused to smtp.example.com:587",
"stack_trace": "Traceback (most recent call last):\n File \"...\"\nSMTPConnectError: Connection refused"
}
}

A heartbeat event is sent periodically by a running worker to confirm it is still alive. The SDK sends a heartbeat on startup and then at a regular interval (default: every 30 seconds). TraceStax uses heartbeats to populate the Worker Fleet view and to fire alerts when expected workers go offline.

Fields:

FieldTypeRequiredDescription
type"heartbeat"YesDiscriminator field
frameworkstringYesThe job framework
workerobjectYesThe same worker object as in task_event (key, hostname, pid, concurrency, queues)
timestampstringYesISO 8601 timestamp of when the heartbeat was generated

Example payload:

{
"type": "heartbeat",
"framework": "bullmq",
"worker": {
"key": "api-worker-7:9801",
"hostname": "api-worker-7.internal",
"pid": 9801,
"concurrency": 4,
"queues": ["notifications", "webhooks"]
},
"timestamp": "2026-03-24T14:22:00.000Z"
}

A snapshot event is sent by the SDK every 60 seconds and reports the current state of all queues the worker is aware of. Unlike task_event which reflects individual job executions, a snapshot gives TraceStax a point-in-time view of queue depth — how many jobs are waiting, active, and failed — along with a throughput measurement.

Fields:

FieldTypeRequiredDescription
type"snapshot"YesDiscriminator field
frameworkstringYesThe job framework
worker_keystringYesThe stable worker key, used to attribute the snapshot to a specific worker process
queuesarrayYesOne entry per queue the worker monitors (see below)
timestampstringYesISO 8601 timestamp of when the snapshot was taken

Queue entry fields:

FieldTypeRequiredDescription
namestringYesQueue name
depthintegerYesNumber of jobs waiting to be processed
activeintegerYesNumber of jobs currently being processed across all workers
failedintegerYesNumber of jobs in the failed/dead-letter state
throughput_per_minnumberYesJobs completed per minute over the last measurement window, as reported by the framework

Example payload:

{
"type": "snapshot",
"framework": "sidekiq",
"worker_key": "sidekiq-prod-3:22041",
"queues": [
{
"name": "default",
"depth": 142,
"active": 10,
"failed": 3,
"throughput_per_min": 47.2
},
{
"name": "critical",
"depth": 0,
"active": 2,
"failed": 0,
"throughput_per_min": 8.1
}
],
"timestamp": "2026-03-24T14:22:00.000Z"
}
Event typeUsed for
task_eventAnomaly detection (duration, failure rate, throughput), job history, error tracking
heartbeatWorker Fleet view, online/offline status, fleet size alerts
snapshotQueue depth charts, backlog detection, queue stall detection

These three signals are complementary. task_event data tells you what happened to individual jobs; snapshot data tells you the state of the queue at a point in time; heartbeat data tells you which workers are alive to process that queue.

LimitValue
Maximum event size64 KB per event
Maximum batch size100 events per POST to /v1/ingest

Events larger than 64 KB are rejected with a 413 response. The SDK truncates stack traces and error messages if necessary to stay within this limit. Batches exceeding 100 events are also rejected; the SDK automatically splits large batches into multiple requests.