COG-11: Agent Lifecycle
Status: Exploratory Draft
Version: 0.1
Created: 2026-02-27
Authors: Mike Anderson
This specification is an exploratory draft describing agent lifecycle patterns that are under active development. The concepts, interfaces, and behaviours described here are subject to significant change as the design evolves through implementation experience and community feedback.
For the stable foundations that this specification builds on, see COG-8: Jobs (job lifecycle and state chains) and COG-9: Agent Messaging (message delivery and processing).
This standard specifies the agent lifecycle on the Covia Grid — how stateful, persistent agents are created, how they process messages through a run loop, how they transition between states, and how they terminate. It builds on the job-as-agent model introduced in COG-8 and the messaging protocol defined in COG-9.
Purpose
COG-8 establishes that an agent is simply a job whose transition function never reaches a terminal state — a persistent, stateful process that accepts repeated interactions. COG-9 defines how messages are delivered to such processes.
This specification completes the agent model by defining:
- Agent state: The data structure that persists between interactions
- Agent lifecycle: The states an agent moves through from creation to termination
- Run loop: How agents process messages and produce new state
- Transition functions: The pluggable operations that define agent behaviour
- Three-level architecture: How framework, domain logic, and external calls are separated
Terminology
See COG-1: Architecture for core Grid terminology and COG-8: Jobs for job-related terms. Additional terms used in this specification:
| Term | Definition |
|---|---|
| Agent | A persistent, stateful process on the Grid that accepts repeated interactions via message delivery |
| Agent Record | The complete state of an agent at a point in time — a single atomic value |
| Transition Function | An Operation that receives the agent's current state and new messages, and returns updated state |
| Run Loop | The cycle of reading the inbox, invoking the transition function, and recording the result |
| Timeline | An append-only log of successful transition records, providing an audit trail of the agent's history |
Core Principles
Single Atomic Value
An agent's entire state is one map. Every operation — create, message delivery, run, configuration update — atomically replaces the whole map. There is no per-field merge and no child lattices within the agent record. The map is the unit of state.
This simplifies reasoning about agent state: at any point, the agent record is a consistent snapshot that genuinely existed, never a mixture of fields from different points in time.
Last Writer Wins
Agent state uses last-writer-wins (LWW) merge semantics based on a timestamp field. The record with the later timestamp wins unconditionally. Since all writes to an agent are serialised on the hosting venue, timestamps are monotonic and state only advances.
Separation of Concerns
The agent system separates responsibilities into three levels, each of which is a pluggable Grid operation:
- Agent Update (framework) — manages the run loop, inbox, timeline, and status transitions
- Agent Transition (domain logic) — processes messages and produces updated state
- External Call (single step) — makes a single stateless call to an external service (e.g. an LLM API)
Each level invokes the next as a standard Grid operation. This means any level can be a local operation, a remote venue operation via federation, or a test mock.
Specification
Agent Record
An agent's value is a plain map with the following fields:
| Field | Type | Description |
|---|---|---|
ts | long | Timestamp of the last write. The merge discriminator — later ts always wins. |
status | string | Current lifecycle status (see below) |
config | map | Framework-level configuration. Opaque to the transition function. |
state | any | User-defined state. Opaque to the framework. Passed to and returned from the transition function. |
inbox | vector | Messages awaiting processing. Drained on successful run. |
timeline | vector | Append-only log of transition records. Grows with each successful agent run. |
caps | map | Capability sets (reserved for future capability enforcement) |
error | string? | Last error message, or null |
The framework manages all fields except state, which is owned by the transition function. The transition function never manages framework fields; the framework never inspects state.
Agent Lifecycle
Agents have a lifecycle that is distinct from, but related to, the job lifecycle defined in COG-8:
create
│
▼
┌───────────┐
│ SLEEPING │◀─────────── successful run
└───────────┘
│
run (inbox non-empty)
│
▼
┌───────────┐
│ RUNNING │
└───────────┘
│
┌────┴────┐
│ │
success failure
│ │
▼ ▼
SLEEPING ┌───────────┐
│ SUSPENDED │
└───────────┘
│
clear error
│
▼
SLEEPING
From any non-terminal state:
│
terminate
│
▼
┌────────────┐
│ TERMINATED │
└────────────┘
Status Values
| Status | Description |
|---|---|
SLEEPING | Agent is idle, ready to process messages when the run loop is triggered |
RUNNING | Agent is actively executing its transition function |
SUSPENDED | Agent is paused due to an error in the transition function. Inbox is preserved for retry. |
TERMINATED | Agent has been permanently stopped. No further messages are accepted. |
Status Transitions
| From | To | Trigger |
|---|---|---|
| (none) | SLEEPING | Agent created |
SLEEPING | RUNNING | Run loop triggered with non-empty inbox |
RUNNING | SLEEPING | Transition function succeeds |
RUNNING | SUSPENDED | Transition function fails |
SUSPENDED | SLEEPING | Error cleared (manual recovery) |
| Any non-terminal | TERMINATED | Explicit termination |
Operations
Every operation atomically replaces the agent record with a new timestamp.
Create
Creates the initial agent record with status SLEEPING, empty inbox, empty timeline, and optional initial state.
The initial state allows the creator to seed transition-function-specific configuration (e.g. LLM provider, model, system prompt) that the transition function reads and preserves across runs. Framework config is kept separate from transition function concerns.
Idempotent: If the agent record already exists, create is a no-op.
Message Delivery
Reads the current agent record, appends the message to the inbox, and writes the updated record. The agent is not automatically woken — the message sits in the inbox until the next run.
Messages MUST be rejected if the agent does not exist or if the agent status is TERMINATED.
Run Loop
The run loop is the core mechanism for processing messages. When triggered:
- Read the current agent record
- If the inbox is empty, no-op
- Set status to
RUNNING, write the agent record - Invoke the transition function with the agent ID, current state, and inbox
- On success:
- Update
statefrom the returned value - Append a timeline entry (transition operation, starting state, messages processed, returned result, start/end timestamps)
- Clear the inbox
- Set status to
SLEEPING - Write the agent record
- Update
- On error:
- Leave
stateandinboxunchanged - Set status to
SUSPENDED, seterror - Write the agent record
- Leave
The run loop writes twice: once to mark running (step 3), once to record the outcome (step 5 or 6). Each write is a complete, atomic agent record replacement.
On error, the inbox is preserved — the same messages are available for retry after the error is resolved and the agent is resumed.
Timeline Entry
Each entry in the timeline records one successful run:
| Field | Type | Description |
|---|---|---|
start | long | Timestamp when the run started |
end | long | Timestamp when the run completed |
op | string | The operation reference used for the transition function |
state | any | The starting state passed to the transition function |
messages | vector | The inbox messages passed to the transition function |
result | any | The result returned by the transition function |
The output state is not stored in the timeline entry — it is the state field in the agent record (for the latest run) or the state field in the next timeline entry (for earlier runs). This avoids redundant storage.
Timeline entries are only written on success. On error, no timeline entry is created.
Transition Function Contract
The transition function is a standard Grid Operation with the following contract:
Input:
| Field | Type | Description |
|---|---|---|
agent-id | string | The agent's identifier |
state | any | Current user-defined state from the agent record. Null on first run. |
messages | vector | The inbox messages to process |
Output:
| Field | Type | Description |
|---|---|---|
state | any | Updated user-defined state. Written back to the agent record. |
result | any | Summary of the transition outcome. Recorded in the timeline. |
The transition function does not manage timestamps, status, timeline, or inbox — it is a pure function from (state, messages) to (state, result).
The transition function MUST handle its own errors internally. If it throws an unhandled exception, the framework treats this as a severe failure: the agent is suspended, the inbox is preserved, and the error is recorded. No timeline entry is written.
Three-Level Architecture
The agent system separates concerns into three levels. Each level is a Grid operation, invokable locally, remotely, or via orchestration:
Level 1: Agent Update (framework — manages run loop)
│ reads inbox, invokes level 2, writes timeline and status
│
▼
Level 2: Agent Transition (domain logic — manages state)
│ processes messages, maintains conversation/workflow state
│ invokes level 3 for external calls
│
▼
Level 3: External Call (single step — stateless)
makes one external request (LLM API, HTTP, etc.)
returns structured response
Level 1 is the same for every agent — it is the run loop defined above. It owns the agent record and invokes level 2 as a Grid operation.
Level 2 is the pluggable part. Different agents use different transition functions: an LLM-backed conversation agent, a rule engine, a workflow coordinator, or custom logic. Level 2 receives current state and messages, returns updated state and a result summary.
Level 3 is a standard Grid operation that makes a single external call. For LLM agents, this is an LLM inference operation. Level 3 knows about API serialisation, authentication, and provider-specific details. It does not know about agents, conversation history, or the run loop.
The level 2 operation is specified by the caller when triggering the run loop. The level 3 operation is specified by the agent creator in the agent's initial state configuration.
Credential Access
Operations that need API keys or other secrets resolve them from two sources, in priority order:
- User's secret store — an encrypted per-user credential store, using the secret name declared in the operation's metadata
- Input parameter — an optional plaintext field for testing only
The agent's configuration does not contain API keys. The operation metadata declares which secret it needs, and the runtime resolves it from the caller's secret store. This keeps agent configuration clean and credentials in encrypted storage.
Relationship to Jobs
The agent lifecycle defined here operates within the job lifecycle defined in COG-8. Specifically:
- An agent is created as part of a job invocation
- The agent's run loop is triggered by job operations
- Agent status (
SLEEPING,RUNNING,SUSPENDED,TERMINATED) describes the agent's internal state, while job status (PENDING,STARTED,INPUT_REQUIRED,COMPLETE, etc.) describes the job's external state as seen by clients - Messages delivered via COG-9 endpoints reach the agent's inbox through the job message queue
The mapping between agent status and job status is implementation-defined. A typical mapping would be:
| Agent Status | Job Status | Meaning |
|---|---|---|
SLEEPING | INPUT_REQUIRED | Agent is idle, awaiting next message |
RUNNING | STARTED | Agent is processing messages |
SUSPENDED | FAILED or PAUSED | Agent encountered an error |
TERMINATED | COMPLETE | Agent has finished |
Merge Semantics
Agent records use LWW merge semantics at the lattice level. The record with the later ts wins unconditionally.
This is correct because:
- Monotonic timestamps: All writes to an agent are serialised on a single venue, so
tsis monotonically increasing - Atomic snapshots: The winner is always a complete, consistent state that genuinely existed — never a mixture of fields from different writes
- Simple replication: Cross-venue sync is replication. The hosting venue always has the latest
ts. Replicas receive the complete state.
Security Considerations
Resource Limits
Venues SHOULD impose limits on:
- Agent count per user — to prevent resource exhaustion
- Inbox size — to prevent memory exhaustion from rapid message delivery
- Timeline size — to bound the storage cost of long-running agents
- Agent lifetime — to prevent indefinitely dormant agents from consuming resources
Credential Security
Agent state and timeline entries may indirectly reference sensitive information. Venues SHOULD:
- Never store plaintext credentials in the agent record
- Use an encrypted per-user secret store for API keys and other credentials
- Redact sensitive fields in timeline entries where possible
Message Validation
The transition function receives arbitrary messages from the inbox. Implementations SHOULD:
- Validate message format before processing
- Handle malformed or malicious messages gracefully (return an error state rather than crashing)
- Apply rate limits on message delivery per agent
Transition Function Safety
A failing transition function suspends the agent, preserving the inbox for retry. This design prevents message loss but means a pathological transition function could be repeatedly retried. Venues SHOULD:
- Track consecutive failures and escalate (e.g. terminate after N failures)
- Apply timeouts to individual transition function invocations
- Log transition function errors for operator review
Agent Workspace
Agents have access to persistent, user-scoped storage namespaces via the covia adapter operations. These are standard default tools available to all LLM-backed agents.
Namespaces
/w/(workspace) — general-purpose data storage for agent knowledge, state, logs, and working data/o/(operations) — user-defined operation definitions/h/(HITL) — reserved for human-in-the-loop requests (Phase D)
All namespaces are readable via covia:read, covia:list, and covia:slice. The /w/ and /o/ namespaces are writable via covia:write, covia:delete, and covia:append. Other namespaces (/g/, /s/, /j/) are framework-managed and reject direct writes.
Operations
| Operation | Description |
|---|---|
covia:read | Read a value at any lattice path. Supports maxSize guard. |
covia:write | Write a value to /w/ or /o/ at any depth. Creates intermediate maps. |
covia:delete | Delete a key from /w/ or /o/ at any depth. |
covia:append | Append an element to a vector in /w/ or /o/. Creates vector if absent. |
covia:slice | Read a paginated slice from a collection (vector, map, or set). |
covia:list | Describe structure at a path: type, count, keys for maps. |
Paths support mixed map/vector navigation — e.g. w/records/1/name navigates into a map, then a vector by index, then a map field.
See COG-4: Grid Lattice for the per-user namespace structure and merge semantics.
Future Directions
The following capabilities are anticipated but not yet specified:
- Cross-user workspace reads — agents reading another user's
/w/namespace, gated by capability delegation (see COG-13: Agent Capabilities) - HITL requests — the
/h/namespace for human-in-the-loop interaction patterns - Cross-user messaging — agents sending messages to agents owned by different users
- Agent forking — creating a copy of an agent's state for branching conversations or experiments
- Cross-venue migration — transferring an agent's state from one venue to another
- Wake triggers — automatic run loop invocation on message delivery, timer events, or external signals
- Recovery — handling agents left in
RUNNINGstatus after a venue restart
Related Specifications
- COG-1: Architecture — Overall Grid architecture and terminology
- COG-7: Operations — Operation definitions, adapter types, and invocation model
- COG-8: Jobs — Job lifecycle, state chains, and the job-as-agent model
- COG-9: Agent Messaging — Message delivery to jobs and agents, protocol compatibility
- COG-13: Agent Capabilities — Capability model, UCAN delegation, and enforcement