COG-8: Jobs
Status: Draft (Work in Progress)
Version: 0.2
Created: 2025-01-28
Updated: 2026-01-29
Authors: Mike Anderson
This specification is under active development. Structure and details may change significantly based on implementation experience and community feedback.
This standard specifies Jobs - the execution tracking mechanism for operations on the Covia Grid. A Job represents a running process on the Grid, modelled as a pointer to an evolving chain of immutable state records.
Purpose
Jobs are the runtime counterpart to Operations. While an operation defines what can be executed, a job represents a specific execution of that operation. Jobs provide:
- Asynchronous Execution: Clients submit work and track progress without blocking
- Observable Lifecycle: Every job transitions through well-defined states
- Error Handling: Structured error reporting for failed, rejected, or timed-out work
- Interactive Workflows: Support for human-in-the-loop patterns where jobs pause for input or authorisation
- Auditability: Job state chains provide a verifiable, immutable history of all computation performed on the Grid
- Unified Data Model: Job state records share the same content-addressed, immutable structure as other Grid assets
Terminology
See COG-1: Architecture for definitions of Grid terminology including Job, Operation, Venue, and Client.
Key Concepts
Job as a Pointer to Process State
A Job is not a mutable record. It is a HEAD pointer to the latest entry in a chain of immutable state records. Each state transition appends a new immutable record to the chain, linked to its predecessor by a content-addressed hash.
Job HEAD ──▶ State(N) ──prev──▶ State(N-1) ──prev──▶ ... ──prev──▶ State(0)
(latest) (initial)
Each state record in the chain is an immutable, content-addressed value — structurally identical to an Artifact or Operation asset. The only mutable element is the HEAD pointer itself, which advances as the job progresses.
This design provides several properties:
- Audit trail: The full execution history is preserved as a chain of immutable records, each cryptographically linked to its predecessor
- Verifiability: Any participant can verify the integrity of a job's history by walking the chain and checking hashes
- Lattice compatibility: Immutable state records merge naturally using union semantics in the Grid Lattice, eliminating the need for timestamp-based conflict resolution on job data
- Structural sharing: When state records are stored as native lattice data structures, unchanged fields (e.g.
input,op) are automatically deduplicated across state transitions via the lattice's Merkle tree structure
State Record Structure
Each state record in the chain contains:
| Field | Presence | Description |
|---|---|---|
status | REQUIRED | Current lifecycle status |
prev | REQUIRED (except initial) | Content-addressed hash of the previous state record |
op | Initial record | Operation identifier (Asset ID, name, or adapter reference) |
input | Initial record | Input provided at job creation |
output | Terminal (COMPLETE) | Result produced by the operation |
error | Terminal (FAILED, REJECTED, CANCELLED) | Error description |
message | OPTIONAL | Context about current state (e.g. what input is required) |
updated | RECOMMENDED | Timestamp in milliseconds since Unix epoch |
Fields that have not changed since the previous state record MAY be omitted from subsequent records, since the full state can be reconstructed by walking the chain. However, implementations MAY include all fields in every record for convenience.
Example Chain
A simple echo operation produces a three-link chain:
State(2) — HEAD
{
"status": "COMPLETE",
"prev": "0xdef...",
"output": {"text": "hello"},
"updated": 1769683717710
}
│
▼ prev
State(1)
{
"status": "STARTED",
"prev": "0x123...",
"updated": 1769683717708
}
│
▼ prev
State(0) — initial
{
"status": "PENDING",
"prev": null,
"op": "test:echo",
"input": {"text": "hello"},
"updated": 1769683717706
}
Verifiability of Job State
The chain model enables cryptographic verification of job execution history:
-
Each state record is content-addressed. The record's identifier is derived from its content (see Asset ID Scheme below). Any modification to a record changes its identifier, making tampering detectable.
-
Chain integrity. Each record contains the hash of its predecessor in the
prevfield. A verifier can walk the chain from HEAD to the initial state, confirming that no records have been inserted, removed, or modified. -
Cross-venue verification. Because state records are immutable and content-addressed, they can be safely replicated to other venues. A client can request the same job history from multiple venues and confirm consistency.
-
Selective disclosure. A client can share a specific state record (e.g. the terminal COMPLETE record) along with its chain as proof that a computation occurred, without revealing the full job context.
Relationship to Assets
Job state records, Artifacts, and Operations are all immutable, content-addressed records stored in the lattice. They differ only in their metadata shape:
| Record Type | Distinguishing Field | Purpose |
|---|---|---|
| Artifact | content | Immutable data |
| Operation | operation | Executable function definition |
| State Record | status + prev | Job execution state |
This unification means that job state records can be stored, replicated, and verified using the same mechanisms as any other asset on the Grid.
One-Shot Jobs and Multi-Turn Jobs
All jobs share the same lifecycle and data model, but differ in their interaction pattern:
One-shot jobs are created with an initial input, execute a single operation, and reach a terminal state. The state chain is short: PENDING → STARTED → COMPLETE (or FAILED). This is the model used by simple tool calls and operation invocations.
Multi-turn jobs cycle through interactive states, accepting additional input between processing steps. The state chain grows with each interaction: PENDING → STARTED → INPUT_REQUIRED → STARTED → INPUT_REQUIRED → ... → COMPLETE. This is the model used by conversational agents, orchestrated workflows, and any process requiring human-in-the-loop interaction.
Both types use the same job lifecycle, the same state chain structure, and the same observation mechanisms (polling, SSE). The difference is purely behavioural — a one-shot job's transition function produces a terminal state on the first invocation, while a multi-turn job's transition function produces an interactive state and waits for the next message.
Clients that require a synchronous result (e.g. a single tool call) simply wait for the terminal state — regardless of how many intermediate transitions occur. The job's internal interaction loop is opaque to a synchronous caller. See COG-9: Agent Messaging for the asynchronous message delivery mechanism that drives multi-turn interactions.
Agents as Persistent Jobs
A Job whose transition function produces a terminal state is a finite process — it runs to completion. A Job whose transition function never reaches a terminal state is an Agent — a persistent, stateful process that accepts repeated interactions.
In this model:
- The state record holds the agent's current state (e.g. conversation history, workflow position)
- The transition function is an Operation asset that defines how the agent processes input and produces the next state
- Each interaction appends a new state record to the chain
Agent HEAD ──▶ State(N) [status: INPUT_REQUIRED]
│
│ transition_fn(State(N), user_input) → State(N+1)
▼
State(N+1) [status: INPUT_REQUIRED]
The INPUT_REQUIRED and AUTH_REQUIRED interactive statuses (defined below) naturally support agent interaction patterns. An agent waiting for the next user message is simply a job in INPUT_REQUIRED state. When input arrives, the transition function (an LLM call, orchestrator step, or any other operation) produces the next state.
The distinction between a Job and an Agent is purely behavioural — whether the process converges to a terminal state or not. The data model is identical.
See COG-11: Agent Lifecycle for the full agent lifecycle specification, including agent state, the run loop, transition functions, and the three-level architecture.
Specification
Job Lifecycle
Every job follows a lifecycle defined by its status. Status transitions are unidirectional — a job moves forward through the lifecycle and never returns to a previous state (with the exception of interactive states resuming to STARTED).
Status Values
| Status | Category | Description |
|---|---|---|
PENDING | Active | Job created, queued for execution |
STARTED | Active | Job is currently executing |
COMPLETE | Terminal | Job finished successfully with output |
FAILED | Terminal | Job finished with an error |
CANCELLED | Terminal | Job was cancelled by client or venue |
REJECTED | Terminal | Job was rejected before execution (e.g. policy violation, invalid input) |
TIMEOUT | Terminal | Job exceeded its time limit |
PAUSED | Interactive | Job execution suspended, awaiting a resume signal |
INPUT_REQUIRED | Interactive | Job requires additional input from the client to continue |
AUTH_REQUIRED | Interactive | Job requires authorisation or credentials to continue |
Status Categories
Statuses fall into three categories:
- Active (
PENDING,STARTED): The job is progressing. Clients should poll or subscribe for updates. - Terminal (
COMPLETE,FAILED,CANCELLED,REJECTED,TIMEOUT): The job has finished. No further status changes will occur. The job's output or error is available. - Interactive (
PAUSED,INPUT_REQUIRED,AUTH_REQUIRED): The job is waiting for external action. The client must respond before the job can continue.
State Transitions
┌─────────────────┐
│ REJECTED │
└─────────────────┘
▲
│ (invalid input,
│ policy violation)
│
┌──────────┐ ┌──────────┐ ┌──────────────┐
│ PENDING │ ────────▶ │ STARTED │ ────────▶ │ COMPLETE │
└──────────┘ └──────────┘ └──────────────┘
│ ▲ │
│ │ │
┌────┘ │ └────┐
│ │ │
▼ │ ▼
┌───────────┐ │ ┌──────────┐
│ PAUSED / │───┘ │ FAILED │
│ INPUT_ │ └──────────┘
│ REQUIRED /│
│ AUTH_ │ ┌──────────┐
│ REQUIRED │ │ CANCELLED │
└───────────┘ └──────────┘
▲
│
(from any non-terminal state)
┌──────────┐
│ TIMEOUT │
└──────────┘
▲
│
(from any non-terminal state)
The permitted transitions are:
| From | To | Trigger |
|---|---|---|
PENDING | STARTED | Venue begins execution |
PENDING | REJECTED | Venue rejects the job (policy, quota, invalid input) |
PENDING | CANCELLED | Client cancels before execution begins |
STARTED | COMPLETE | Operation finishes successfully |
STARTED | FAILED | Operation encounters an error |
STARTED | CANCELLED | Client cancels during execution |
STARTED | TIMEOUT | Execution exceeds time limit |
STARTED | PAUSED | Operation suspends execution |
STARTED | INPUT_REQUIRED | Operation needs additional client input |
STARTED | AUTH_REQUIRED | Operation needs client authorisation |
PAUSED / INPUT_REQUIRED / AUTH_REQUIRED | STARTED | Client provides required input or authorisation |
PAUSED / INPUT_REQUIRED / AUTH_REQUIRED | CANCELLED | Client cancels while paused |
PAUSED / INPUT_REQUIRED / AUTH_REQUIRED | TIMEOUT | Paused job exceeds time limit |
Each transition appends a new immutable state record to the job's chain.
Job Identification
A job is identified by a unique Job ID within the venue. The Job ID is assigned at creation time and remains stable throughout the job's lifecycle — it identifies the HEAD pointer, not any individual state record.
Job IDs are represented as hex strings. The format is implementation-defined, but MUST be unique within the venue.
Individual state records in the chain are identified by their content-addressed hash (see Asset ID Scheme).
Job Data
The current state of a job is represented as a JSON object when accessed via the API. This object reflects the latest state record in the chain, with all fields resolved (including inherited fields from earlier records).
id (REQUIRED)
The Job ID — the stable identifier for this job within the venue.
{
"id": "0x12345678901234567890123456789012"
}
status (REQUIRED)
The current lifecycle status of the job. MUST be one of the status values defined above.
{
"status": "STARTED"
}
operation (RECOMMENDED)
The identifier of the operation being executed (Asset ID, operation name, or adapter reference).
{
"operation": "0x7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b"
}
output (present when COMPLETE)
The result produced by the operation. The structure of the output is defined by the operation's output schema (see COG-7: Operations).
{
"status": "COMPLETE",
"output": {
"result": "Success",
"data": { "count": 42 }
}
}
error (present when FAILED, REJECTED, or CANCELLED)
A human-readable error message describing why the job did not complete.
{
"status": "FAILED",
"error": "Connection to upstream service timed out"
}
message (OPTIONAL)
A general-purpose message providing context about the current state. Particularly useful for interactive states to describe what input or authorisation is needed.
{
"status": "INPUT_REQUIRED",
"message": "Please provide the API key for the target service"
}
input (OPTIONAL)
The input that was provided when the job was created.
created (RECOMMENDED)
Timestamp in milliseconds since Unix epoch when the job was created.
updated (RECOMMENDED)
Timestamp in milliseconds since Unix epoch of the last status change.
Job Creation
Jobs are created when a client invokes an operation on a venue via POST /api/v1/invoke. See COG-7: Operations for the invocation model.
A venue MUST create a job for every accepted invocation request.
A venue MUST create an initial state record with status PENDING (or REJECTED if the job is immediately rejected).
A venue MAY immediately append a STARTED state record if execution begins synchronously.
Job Observation
Clients can observe job progress through two mechanisms:
Polling
Clients poll the job status via GET /api/v1/jobs/{id}. This returns the resolved job data from the latest state record, including current status, output (if complete), or error (if failed).
Clients SHOULD use exponential backoff when polling to avoid overloading the venue. Recommended parameters:
| Parameter | Value |
|---|---|
| Initial delay | 300ms |
| Backoff factor | 1.5x |
| Maximum delay | 10s |
Server-Sent Events (SSE)
Clients can subscribe to real-time updates via GET /api/v1/jobs/{id}/sse. The venue pushes an event each time a new state record is appended to the chain.
SSE is preferred over polling for long-running jobs because it:
- Reduces network overhead
- Provides immediate notification of state changes
- Avoids the latency inherent in polling intervals
Job History
Clients MAY retrieve the full state chain for a job via GET /api/v1/jobs/{id}/history. This returns the ordered sequence of state records from initial to latest, enabling audit and verification of the complete execution history.
Job Pause
Clients can pause a running job via PUT /api/v1/jobs/{id}/pause.
A venue MUST append a state record with status PAUSED if the job is in a non-terminal, non-paused state (PENDING, STARTED, INPUT_REQUIRED, AUTH_REQUIRED).
A venue MUST return 409 Conflict if the job is already in a terminal state or already paused.
Pausing suspends execution. The adapter is not re-invoked until the job is resumed.
Job Resume
Clients can resume a paused job via PUT /api/v1/jobs/{id}/resume.
A venue MUST append a state record with status STARTED if the job is in PAUSED state, and re-engage the adapter to continue execution.
A venue MUST return 409 Conflict if the job is not in PAUSED state.
Note: INPUT_REQUIRED and AUTH_REQUIRED jobs are resumed by delivering a message (see COG-9: Agent Messaging), not by calling the resume endpoint. The resume endpoint is specifically for PAUSED jobs.
Job Cancellation
Clients can cancel a job via PUT /api/v1/jobs/{id}/cancel.
A venue MUST append a state record with status CANCELLED if the job is in a non-terminal state.
A venue MUST ignore cancel requests for jobs that are already in a terminal state.
Cancellation is best-effort — the underlying operation may have already produced side effects before the cancellation takes effect.
Job Deletion
Clients can remove a job record via PUT /api/v1/jobs/{id}/delete.
Deletion removes the HEAD pointer from the venue's job index. The immutable state records in the chain MAY be retained in lattice storage for audit purposes, or MAY be garbage collected at the venue's discretion.
Deletion does not affect any resources or outputs produced by the job.
Job Finality
Once a job reaches a terminal state, no further state records are appended to its chain:
- The HEAD pointer MUST NOT advance beyond a terminal state record
- The terminal state record's
outputorerrorfields MUST NOT be modified (they are immutable by construction) - The job record MUST remain available for querying until explicitly deleted
The immutable chain structure ensures that job records serve as reliable, verifiable audit evidence of computation performed on the Grid.
Lattice Storage
Job state records are stored in the Grid Lattice as immutable, content-addressed values. Because state records are immutable, they can be stored using union merge semantics — the same strategy used for assets — rather than timestamp-based merge.
The HEAD pointer (Job ID → latest state record hash) is the only mutable element and requires a lightweight mutable index within the venue's lattice state.
When state records are stored as native lattice data structures (rather than serialised JSON), the lattice provides automatic structural sharing: fields common across state records (such as input, op, and unchanged context) are deduplicated in the Merkle tree without any explicit compaction.
Interactive Jobs
Interactive statuses (PAUSED, INPUT_REQUIRED, AUTH_REQUIRED) enable human-in-the-loop, multi-step workflows, and agent interaction patterns where execution cannot proceed without external action.
Use Cases
| Status | Use Case |
|---|---|
PAUSED | Debugging breakpoint; operator-initiated suspension; rate limiting |
INPUT_REQUIRED | Operation needs additional parameters not provided at invocation; multi-turn conversation flows; agent interaction |
AUTH_REQUIRED | Operation needs credentials for a downstream service; consent or approval step |
Client Responsibilities
When a job enters an interactive state, the client SHOULD:
- Read the
messagefield to understand what is required - Deliver the requested input via
POST /api/v1/jobs/{id}(see COG-9: Agent Messaging) - Monitor for resumption to
STARTEDvia polling or SSE
If a client cannot fulfil an interactive request, it SHOULD cancel the job rather than leaving it indefinitely paused.
Venues MAY impose timeouts on interactive states to prevent resource leaks from abandoned jobs.
Agent Interaction Pattern
For persistent agents (jobs that cycle through interactive states), the client interaction loop is:
1. Invoke operation → Job created (PENDING → STARTED → INPUT_REQUIRED)
2. Client sends message → Job resumes (STARTED → INPUT_REQUIRED)
3. Repeat step 2 for each interaction turn
4. Agent terminates → Job reaches terminal state (COMPLETE or FAILED)
The state chain grows with each interaction turn. Each turn's input and output are preserved as immutable records in the chain, providing a complete, verifiable interaction history.
Messages are delivered via COG-9: Agent Messaging, which provides a per-job message queue. Messages can be submitted at any time — including while the job is actively processing (STARTED). Queued messages are processed in order when the job is ready for input. This decoupling means clients do not need to wait for an interactive state before sending the next message.
Synchronous vs. Asynchronous Observation
The same job supports both synchronous and asynchronous interaction:
- Synchronous callers (e.g. an MCP
tools/callbridge) wait for the terminal state. The entire multi-turn interaction — including any intermediate message exchanges — is opaque to them. They submit the initial input and receive the final result. - Asynchronous callers (e.g. a conversational UI, an A2A agent) observe each state transition in real time via SSE, and submit messages via
POST /api/v1/jobs/{id}as the interaction progresses.
Both patterns operate on the same job, the same state chain, and the same message queue. The choice is the client's, not the job's.
Asset ID Scheme
Job state records, like all content-addressed data on the Grid, are identified by lattice Value IDs — the SHA3-256 hash of their canonical CAD003 binary encoding (see COG-5: Asset Metadata).
This means:
- The lattice automatically computes Value IDs for all stored data structures, so state record identification is a zero-cost property of storage
- Structural sharing and deduplication across state records are handled natively by the lattice
- Unchanged fields across state transitions (such as
input,op) are automatically deduplicated in the Merkle tree
Examples
Simple Invocation
Client Venue
│ │
│ POST /api/v1/invoke │
│ {"operation": "test:echo", │
│ "input": {"text": "hello"}} │
│ ─────────────────────────────────▶ │
│ │ Create State(0): PENDING
│ 201 Created │ Create State(1): STARTED
│ {"id": "0xabc...", │
│ "status": "PENDING"} │
│ ◀───────────────────────────────── │
│ │ Execute → State(2): COMPLETE
│ GET /api/v1/jobs/0xabc... │
│ ─────────────────────────────────▶ │
│ │
│ 200 OK │
│ {"id": "0xabc...", │
│ "status": "COMPLETE", │
│ "output": {"text": "hello"}} │
│ ◀───────────────────────────────── │
Cancellation
Client Venue
│ │
│ POST /api/v1/invoke │
│ {"operation": "long-running"} │
│ ─────────────────────────────────▶ │
│ │
│ 201 Created │ State(0): PENDING
│ {"id": "0xdef...", │ State(1): STARTED
│ "status": "PENDING"} │
│ ◀───────────────────────────────── │
│ │
│ PUT /api/v1/jobs/0xdef.../cancel │
│ ─────────────────────────────────▶ │
│ │ State(2): CANCELLED
│ 200 OK │
│ {"id": "0xdef...", │
│ "status": "CANCELLED", │
│ "error": "Job cancelled"} │
│ ◀───────────────────────────────── │
Interactive Job (Input Required)
Client Venue
│ │
│ POST /api/v1/invoke │
│ {"operation": "data-import"} │
│ ─────────────────────────────────▶ │
│ │ State(0): PENDING
│ 201 Created │ State(1): STARTED
│ {"id": "0x123...", │
│ "status": "PENDING"} │
│ ◀───────────────────────────────── │
│ │ State(2): INPUT_REQUIRED
│ GET /api/v1/jobs/0x123... │
│ ─────────────────────────────────▶ │
│ │
│ 200 OK │
│ {"id": "0x123...", │
│ "status": "INPUT_REQUIRED", │
│ "message": "Provide API key"} │
│ ◀───────────────────────────────── │
│ │
│ (Client provides input) │
│ ─────────────────────────────────▶ │ State(3): STARTED
│ │ State(4): COMPLETE
│ GET /api/v1/jobs/0x123... │
│ ─────────────────────────────────▶ │
│ │
│ 200 OK │
│ {"id": "0x123...", │
│ "status": "COMPLETE", │
│ "output": {"imported": 1500}} │
│ ◀───────────────────────────────── │
Agent Interaction (Multi-Turn)
Client Venue
│ │
│ POST /api/v1/invoke │
│ {"operation": "llm-agent", │
│ "input": {"prompt": "Hello"}} │
│ ─────────────────────────────────▶ │
│ │ State(0): PENDING
│ 201 Created │ State(1): STARTED (LLM call)
│ {"id": "0x456...", │ State(2): INPUT_REQUIRED
│ "status": "PENDING"} │ (response + awaiting next turn)
│ ◀───────────────────────────────── │
│ │
│ GET /api/v1/jobs/0x456... │
│ ─────────────────────────────────▶ │
│ │
│ 200 OK │
│ {"id": "0x456...", │
│ "status": "INPUT_REQUIRED", │
│ "output": {"response": "Hi!"}, │
│ "message": "Awaiting input"} │
│ ◀───────────────────────────────── │
│ │
│ (Client sends next message) │
│ ─────────────────────────────────▶ │ State(3): STARTED (LLM call)
│ │ State(4): INPUT_REQUIRED
│ GET /api/v1/jobs/0x456... │ (response + awaiting next turn)
│ ─────────────────────────────────▶ │
│ │
│ 200 OK │
│ {"id": "0x456...", │
│ "status": "INPUT_REQUIRED", │
│ "output": {"response": "..."}, │
│ "message": "Awaiting input"} │
│ ◀───────────────────────────────── │
│ │
│ ... (conversation continues) │
Security Considerations
Resource Exhaustion
Jobs consume venue resources (memory, connections, compute). Venues SHOULD:
- Impose limits on the number of concurrent jobs per client
- Enforce execution timeouts for all jobs
- Enforce timeouts on interactive states
- Clean up resources promptly when jobs reach terminal states
Long-lived agent jobs (persistent interactive processes) require particular attention to resource management. Venues SHOULD impose maximum chain lengths or interaction counts for agent jobs.
Job Data Sensitivity
Job records may contain sensitive information in their inputs and outputs. Venues SHOULD:
- Apply access control so that only the submitting client (or authorised parties) can query a job
- Redact or encrypt sensitive fields in stored job records
- Support job deletion to allow clients to remove HEAD pointers for records containing sensitive data
Cancellation Safety
Cancellation does not guarantee rollback. Operations may have produced side effects (network calls, data writes) before cancellation takes effect. Clients SHOULD NOT rely on cancellation as a mechanism for undoing work.
State Chain Integrity
The immutable chain structure provides tamper evidence but not tamper prevention. A malicious venue could fabricate a chain. Clients requiring strong guarantees SHOULD:
- Verify chain integrity by walking
prevlinks and checking content-addressed hashes - Cross-check job state with multiple independent venues
- Use signed state records where available
Related Specifications
- COG-1: Architecture - Overall Grid architecture and terminology
- COG-2: Decentralised ID - Job and venue identification
- COG-4: Grid Lattice - Lattice storage and merge semantics for job state
- COG-5: Asset Metadata - Asset identification and the ID scheme decision
- COG-7: Operations - Operation definitions, invocation model, and orchestration
- COG-9: Agent Messaging - Message delivery to jobs and agents, protocol compatibility
- COG-11: Agent Lifecycle - Stateful agent creation, run loop, transition functions, and state management