COG-11: Agent Lifecycle

Status:      Exploratory Draft
Version:     0.1
Created:     2026-02-27
Authors:     Mike Anderson

Exploratory Draft

This specification is an exploratory draft describing agent lifecycle patterns that are under active development. The concepts, interfaces, and behaviours described here are subject to significant change as the design evolves through implementation experience and community feedback.

For the stable foundations that this specification builds on, see COG-8: Jobs (job lifecycle and state chains) and COG-9: Agent Messaging (message delivery and processing).

Implementation has moved on

The single-inbox run-loop model described below predates the current implementation, which organises intake around sessions (per-conversation threads) rather than one agent-level inbox. For how agents actually work today — sessions, the one-session-per-cycle run loop, and task/chat Jobs — see the user-guide Sessions and Agent Operations pages. This COG will be revised to match.

This standard specifies the agent lifecycle on the Covia Grid — how stateful, persistent agents are created, how they process messages through a run loop, how they transition between states, and how they terminate. It builds on the job-as-agent model introduced in COG-8 and the messaging protocol defined in COG-9.

Purpose

COG-8 establishes that an agent is simply a job whose transition function never reaches a terminal state — a persistent, stateful process that accepts repeated interactions. COG-9 defines how messages are delivered to such processes.

This specification completes the agent model by defining:

Agent state: The data structure that persists between interactions
Agent lifecycle: The states an agent moves through from creation to termination
Run loop: How agents process messages and produce new state
Transition functions: The pluggable operations that define agent behaviour
Three-level architecture: How framework, domain logic, and external calls are separated

Terminology

See COG-1: Architecture for core Grid terminology and COG-8: Jobs for job-related terms. Additional terms used in this specification:

Term	Definition
Agent	A persistent, stateful process on the Grid that accepts repeated interactions via message delivery
Agent Record	The complete state of an agent at a point in time — a single atomic value
Transition Function	An Operation that receives the agent's current state and new messages, and returns updated state
Run Loop	The cycle of reading the inbox, invoking the transition function, and recording the result
Timeline	An append-only log of successful transition records, providing an audit trail of the agent's history

Core Principles

Single Atomic Value

An agent's entire state is one map. Every operation — create, message delivery, run, configuration update — atomically replaces the whole map. There is no per-field merge and no child lattices within the agent record. The map is the unit of state.

This simplifies reasoning about agent state: at any point, the agent record is a consistent snapshot that genuinely existed, never a mixture of fields from different points in time.

Last Writer Wins

Agent state uses last-writer-wins (LWW) merge semantics based on a timestamp field. The record with the later timestamp wins unconditionally. Since all writes to an agent are serialised on the hosting venue, timestamps are monotonic and state only advances.

Separation of Concerns

The agent system separates responsibilities into three levels, each of which is a pluggable Grid operation:

Agent Update (framework) — manages the run loop, inbox, timeline, and status transitions
Agent Transition (domain logic) — processes messages and produces updated state
External Call (single step) — makes a single stateless call to an external service (e.g. an LLM API)

Each level invokes the next as a standard Grid operation. This means any level can be a local operation, a remote venue operation via federation, or a test mock.

Specification

Agent Record

An agent's value is a plain map with the following fields:

Field	Type	Description
`ts`	long	Timestamp of the last write. The merge discriminator — later `ts` always wins.
`status`	string	Current lifecycle status (see below)
`config`	map	Framework-level configuration. Opaque to the transition function.
`state`	any	User-defined state. Opaque to the framework. Passed to and returned from the transition function.
`inbox`	vector	Messages awaiting processing. Drained on successful run.
`timeline`	vector	Append-only log of transition records. Grows with each successful agent run.
`caps`	map	Capability sets (reserved for future capability enforcement)
`error`	string?	Last error message, or null

The framework manages all fields except state, which is owned by the transition function. The transition function never manages framework fields; the framework never inspects state.

Agent Lifecycle

Agents have a lifecycle that is distinct from, but related to, the job lifecycle defined in COG-8:

             create
               │
               ▼
         ┌───────────┐
         │  SLEEPING  │◀─────────── successful run
         └───────────┘
               │
          run (inbox non-empty)
               │
               ▼
         ┌───────────┐
         │  RUNNING   │
         └───────────┘
               │
          ┌────┴────┐
          │         │
     success    failure
          │         │
          ▼         ▼
    SLEEPING   ┌───────────┐
               │ SUSPENDED  │
               └───────────┘
                    │
               clear error
                    │
                    ▼
              SLEEPING

    From any non-terminal state:
               │
          terminate
               │
               ▼
         ┌────────────┐
         │ TERMINATED  │
         └────────────┘

Status Values

Status	Description
`SLEEPING`	Agent is idle, ready to process messages when the run loop is triggered
`RUNNING`	Agent is actively executing its transition function
`SUSPENDED`	Agent is paused due to an error in the transition function. Inbox is preserved for retry.
`TERMINATED`	Agent has been permanently stopped. No further messages are accepted.

Status Transitions

From	To	Trigger
(none)	`SLEEPING`	Agent created
`SLEEPING`	`RUNNING`	Run loop triggered with non-empty inbox
`RUNNING`	`SLEEPING`	Transition function succeeds
`RUNNING`	`SUSPENDED`	Transition function fails
`SUSPENDED`	`SLEEPING`	Error cleared (manual recovery)
Any non-terminal	`TERMINATED`	Explicit termination

Operations

Every operation atomically replaces the agent record with a new timestamp.

Create

Creates the initial agent record with status SLEEPING, empty inbox, empty timeline, and optional initial state.

The initial state allows the creator to seed transition-function-specific configuration (e.g. LLM provider, model, system prompt) that the transition function reads and preserves across runs. Framework config is kept separate from transition function concerns.

Idempotent: If the agent record already exists, create is a no-op.

Message Delivery

Reads the current agent record, appends the message to the inbox, and writes the updated record. The agent is not automatically woken — the message sits in the inbox until the next run.

Messages MUST be rejected if the agent does not exist or if the agent status is TERMINATED.

Run Loop

The run loop is the core mechanism for processing messages. When triggered:

Read the current agent record
If the inbox is empty, no-op
Set status to RUNNING, write the agent record
Invoke the transition function with the agent ID, current state, and inbox
On success:
- Update state from the returned value
- Append a timeline entry (transition operation, starting state, messages processed, returned result, start/end timestamps)
- Clear the inbox
- Set status to SLEEPING
- Write the agent record
On error:
- Leave state and inbox unchanged
- Set status to SUSPENDED, set error
- Write the agent record

The run loop writes twice: once to mark running (step 3), once to record the outcome (step 5 or 6). Each write is a complete, atomic agent record replacement.

On error, the inbox is preserved — the same messages are available for retry after the error is resolved and the agent is resumed.

Timeline Entry

Each entry in the timeline records one successful run:

Field	Type	Description
`start`	long	Timestamp when the run started
`end`	long	Timestamp when the run completed
`op`	string	The operation reference used for the transition function
`state`	any	The starting state passed to the transition function
`messages`	vector	The inbox messages passed to the transition function
`result`	any	The result returned by the transition function

The output state is not stored in the timeline entry — it is the state field in the agent record (for the latest run) or the state field in the next timeline entry (for earlier runs). This avoids redundant storage.

Timeline entries are only written on success. On error, no timeline entry is created.

Transition Function Contract

The transition function is a standard Grid Operation with the following contract:

Input:

Field	Type	Description
`agent-id`	string	The agent's identifier
`state`	any	Current user-defined state from the agent record. Null on first run.
`messages`	vector	The inbox messages to process

Output:

Field	Type	Description
`state`	any	Updated user-defined state. Written back to the agent record.
`result`	any	Summary of the transition outcome. Recorded in the timeline.

The transition function does not manage timestamps, status, timeline, or inbox — it is a pure function from (state, messages) to (state, result).

The transition function MUST handle its own errors internally. If it throws an unhandled exception, the framework treats this as a severe failure: the agent is suspended, the inbox is preserved, and the error is recorded. No timeline entry is written.

Three-Level Architecture

The agent system separates concerns into three levels. Each level is a Grid operation, invokable locally, remotely, or via orchestration:

Level 1: Agent Update          (framework — manages run loop)
  │  reads inbox, invokes level 2, writes timeline and status
  │
  ▼
Level 2: Agent Transition      (domain logic — manages state)
  │  processes messages, maintains conversation/workflow state
  │  invokes level 3 for external calls
  │
  ▼
Level 3: External Call         (single step — stateless)
     makes one external request (LLM API, HTTP, etc.)
     returns structured response

Level 1 is the same for every agent — it is the run loop defined above. It owns the agent record and invokes level 2 as a Grid operation.

Level 2 is the pluggable part. Different agents use different transition functions: an LLM-backed conversation agent, a rule engine, a workflow coordinator, or custom logic. Level 2 receives current state and messages, returns updated state and a result summary.

Level 3 is a standard Grid operation that makes a single external call. For LLM agents, this is an LLM inference operation. Level 3 knows about API serialisation, authentication, and provider-specific details. It does not know about agents, conversation history, or the run loop.

The level 2 operation is specified by the caller when triggering the run loop. The level 3 operation is specified by the agent creator in the agent's initial state configuration.

Credential Access

Operations that need API keys or other secrets resolve them from two sources, in priority order:

User's secret store — an encrypted per-user credential store, using the secret name declared in the operation's metadata
Input parameter — an optional plaintext field for testing only

The agent's configuration does not contain API keys. The operation metadata declares which secret it needs, and the runtime resolves it from the caller's secret store. This keeps agent configuration clean and credentials in encrypted storage.

Relationship to Jobs

The agent lifecycle defined here operates within the job lifecycle defined in COG-8. Specifically:

An agent is created as part of a job invocation
The agent's run loop is triggered by job operations
Agent status (SLEEPING, RUNNING, SUSPENDED, TERMINATED) describes the agent's internal state, while job status (PENDING, STARTED, INPUT_REQUIRED, COMPLETE, etc.) describes the job's external state as seen by clients
Messages delivered via COG-9 endpoints reach the agent's inbox through the job message queue

The mapping between agent status and job status is implementation-defined. A typical mapping would be:

Agent Status	Job Status	Meaning
`SLEEPING`	`INPUT_REQUIRED`	Agent is idle, awaiting next message
`RUNNING`	`STARTED`	Agent is processing messages
`SUSPENDED`	`FAILED` or `PAUSED`	Agent encountered an error
`TERMINATED`	`COMPLETE`	Agent has finished

Merge Semantics

Agent records use LWW merge semantics at the lattice level. The record with the later ts wins unconditionally.

This is correct because:

Monotonic timestamps: All writes to an agent are serialised on a single venue, so ts is monotonically increasing
Atomic snapshots: The winner is always a complete, consistent state that genuinely existed — never a mixture of fields from different writes
Simple replication: Cross-venue sync is replication. The hosting venue always has the latest ts. Replicas receive the complete state.

Security Considerations

Resource Limits

Venues SHOULD impose limits on:

Agent count per user — to prevent resource exhaustion
Inbox size — to prevent memory exhaustion from rapid message delivery
Timeline size — to bound the storage cost of long-running agents
Agent lifetime — to prevent indefinitely dormant agents from consuming resources

Credential Security

Agent state and timeline entries may indirectly reference sensitive information. Venues SHOULD:

Never store plaintext credentials in the agent record
Use an encrypted per-user secret store for API keys and other credentials
Redact sensitive fields in timeline entries where possible

Message Validation

The transition function receives arbitrary messages from the inbox. Implementations SHOULD:

Validate message format before processing
Handle malformed or malicious messages gracefully (return an error state rather than crashing)
Apply rate limits on message delivery per agent

Transition Function Safety

A failing transition function suspends the agent, preserving the inbox for retry. This design prevents message loss but means a pathological transition function could be repeatedly retried. Venues SHOULD:

Track consecutive failures and escalate (e.g. terminate after N failures)
Apply timeouts to individual transition function invocations
Log transition function errors for operator review

Agent Workspace

Agents have access to persistent, user-scoped storage namespaces via the covia adapter operations. These are standard default tools available to all LLM-backed agents.

Namespaces

/w/ (workspace) — general-purpose data storage for agent knowledge, state, logs, and working data
/o/ (operations) — user-defined operation definitions
/h/ (HITL) — reserved for human-in-the-loop requests (Phase D)

All namespaces are readable via covia:read, covia:list, and covia:slice. The /w/ and /o/ namespaces are writable via covia:write, covia:delete, and covia:append. Other namespaces (/g/, /s/, /j/) are framework-managed and reject direct writes.

Operations

Operation	Description
`covia:read`	Read a value at any lattice path. Supports `maxSize` guard.
`covia:write`	Write a value to `/w/` or `/o/` at any depth. Creates intermediate maps.
`covia:delete`	Delete a key from `/w/` or `/o/` at any depth.
`covia:append`	Append an element to a vector in `/w/` or `/o/`. Creates vector if absent.
`covia:slice`	Read a paginated slice from a collection (vector, map, or set).
`covia:list`	Describe structure at a path: type, count, keys for maps.

Paths support mixed map/vector navigation — e.g. w/records/1/name navigates into a map, then a vector by index, then a map field.

See COG-4: Grid Lattice for the per-user namespace structure and merge semantics.

Future Directions

The following capabilities are anticipated but not yet specified:

Cross-user workspace reads — agents reading another user's /w/ namespace, gated by capability delegation (see COG-13: Agent Capabilities)
HITL requests — the /h/ namespace for human-in-the-loop interaction patterns
Cross-user messaging — agents sending messages to agents owned by different users
Agent forking — creating a copy of an agent's state for branching conversations or experiments
Cross-venue migration — transferring an agent's state from one venue to another
Wake triggers — automatic run loop invocation on message delivery, timer events, or external signals
Recovery — handling agents left in RUNNING status after a venue restart

COG-1: Architecture — Overall Grid architecture and terminology
COG-7: Operations — Operation definitions, adapter types, and invocation model
COG-8: Jobs — Job lifecycle, state chains, and the job-as-agent model
COG-9: Agent Messaging — Message delivery to jobs and agents, protocol compatibility
COG-13: Agent Capabilities — Capability model, UCAN delegation, and enforcement

Purpose​

Terminology​

Core Principles​

Single Atomic Value​

Last Writer Wins​

Separation of Concerns​

Specification​

Agent Record​

Agent Lifecycle​

Status Values​

Status Transitions​

Operations​

Create​

Message Delivery​

Run Loop​

Timeline Entry​

Transition Function Contract​

Three-Level Architecture​

Credential Access​

Relationship to Jobs​

Merge Semantics​

Security Considerations​

Resource Limits​

Credential Security​

Message Validation​

Transition Function Safety​

Agent Workspace​

Namespaces​

Operations​

Future Directions​

Related Specifications​

Purpose

Terminology

Core Principles

Single Atomic Value

Last Writer Wins

Separation of Concerns

Specification

Agent Record

Agent Lifecycle

Status Values

Status Transitions

Operations

Create

Message Delivery

Run Loop

Timeline Entry

Transition Function Contract

Three-Level Architecture

Credential Access

Relationship to Jobs

Merge Semantics

Security Considerations

Resource Limits

Credential Security

Message Validation

Transition Function Safety

Agent Workspace

Namespaces

Operations

Future Directions

Related Specifications