Skip to main content

COG-11: Agent Lifecycle

Status:      Exploratory Draft
Version: 0.1
Created: 2026-02-27
Authors: Mike Anderson
Exploratory Draft

This specification is an exploratory draft describing agent lifecycle patterns that are under active development. The concepts, interfaces, and behaviours described here are subject to significant change as the design evolves through implementation experience and community feedback.

For the stable foundations that this specification builds on, see COG-8: Jobs (job lifecycle and state chains) and COG-9: Agent Messaging (message delivery and processing).

This standard specifies the agent lifecycle on the Covia Grid — how stateful, persistent agents are created, how they process messages through a run loop, how they transition between states, and how they terminate. It builds on the job-as-agent model introduced in COG-8 and the messaging protocol defined in COG-9.

Purpose

COG-8 establishes that an agent is simply a job whose transition function never reaches a terminal state — a persistent, stateful process that accepts repeated interactions. COG-9 defines how messages are delivered to such processes.

This specification completes the agent model by defining:

  • Agent state: The data structure that persists between interactions
  • Agent lifecycle: The states an agent moves through from creation to termination
  • Run loop: How agents process messages and produce new state
  • Transition functions: The pluggable operations that define agent behaviour
  • Three-level architecture: How framework, domain logic, and external calls are separated

Terminology

See COG-1: Architecture for core Grid terminology and COG-8: Jobs for job-related terms. Additional terms used in this specification:

TermDefinition
AgentA persistent, stateful process on the Grid that accepts repeated interactions via message delivery
Agent RecordThe complete state of an agent at a point in time — a single atomic value
Transition FunctionAn Operation that receives the agent's current state and new messages, and returns updated state
Run LoopThe cycle of reading the inbox, invoking the transition function, and recording the result
TimelineAn append-only log of successful transition records, providing an audit trail of the agent's history

Core Principles

Single Atomic Value

An agent's entire state is one map. Every operation — create, message delivery, run, configuration update — atomically replaces the whole map. There is no per-field merge and no child lattices within the agent record. The map is the unit of state.

This simplifies reasoning about agent state: at any point, the agent record is a consistent snapshot that genuinely existed, never a mixture of fields from different points in time.

Last Writer Wins

Agent state uses last-writer-wins (LWW) merge semantics based on a timestamp field. The record with the later timestamp wins unconditionally. Since all writes to an agent are serialised on the hosting venue, timestamps are monotonic and state only advances.

Separation of Concerns

The agent system separates responsibilities into three levels, each of which is a pluggable Grid operation:

  1. Agent Update (framework) — manages the run loop, inbox, timeline, and status transitions
  2. Agent Transition (domain logic) — processes messages and produces updated state
  3. External Call (single step) — makes a single stateless call to an external service (e.g. an LLM API)

Each level invokes the next as a standard Grid operation. This means any level can be a local operation, a remote venue operation via federation, or a test mock.

Specification

Agent Record

An agent's value is a plain map with the following fields:

FieldTypeDescription
tslongTimestamp of the last write. The merge discriminator — later ts always wins.
statusstringCurrent lifecycle status (see below)
configmapFramework-level configuration. Opaque to the transition function.
stateanyUser-defined state. Opaque to the framework. Passed to and returned from the transition function.
inboxvectorMessages awaiting processing. Drained on successful run.
timelinevectorAppend-only log of transition records. Grows with each successful agent run.
capsmapCapability sets (reserved for future capability enforcement)
errorstring?Last error message, or null

The framework manages all fields except state, which is owned by the transition function. The transition function never manages framework fields; the framework never inspects state.

Agent Lifecycle

Agents have a lifecycle that is distinct from, but related to, the job lifecycle defined in COG-8:

             create


┌───────────┐
│ SLEEPING │◀─────────── successful run
└───────────┘

run (inbox non-empty)


┌───────────┐
│ RUNNING │
└───────────┘

┌────┴────┐
│ │
success failure
│ │
▼ ▼
SLEEPING ┌───────────┐
│ SUSPENDED │
└───────────┘

clear error


SLEEPING

From any non-terminal state:

terminate


┌────────────┐
│ TERMINATED │
└────────────┘

Status Values

StatusDescription
SLEEPINGAgent is idle, ready to process messages when the run loop is triggered
RUNNINGAgent is actively executing its transition function
SUSPENDEDAgent is paused due to an error in the transition function. Inbox is preserved for retry.
TERMINATEDAgent has been permanently stopped. No further messages are accepted.

Status Transitions

FromToTrigger
(none)SLEEPINGAgent created
SLEEPINGRUNNINGRun loop triggered with non-empty inbox
RUNNINGSLEEPINGTransition function succeeds
RUNNINGSUSPENDEDTransition function fails
SUSPENDEDSLEEPINGError cleared (manual recovery)
Any non-terminalTERMINATEDExplicit termination

Operations

Every operation atomically replaces the agent record with a new timestamp.

Create

Creates the initial agent record with status SLEEPING, empty inbox, empty timeline, and optional initial state.

The initial state allows the creator to seed transition-function-specific configuration (e.g. LLM provider, model, system prompt) that the transition function reads and preserves across runs. Framework config is kept separate from transition function concerns.

Idempotent: If the agent record already exists, create is a no-op.

Message Delivery

Reads the current agent record, appends the message to the inbox, and writes the updated record. The agent is not automatically woken — the message sits in the inbox until the next run.

Messages MUST be rejected if the agent does not exist or if the agent status is TERMINATED.

Run Loop

The run loop is the core mechanism for processing messages. When triggered:

  1. Read the current agent record
  2. If the inbox is empty, no-op
  3. Set status to RUNNING, write the agent record
  4. Invoke the transition function with the agent ID, current state, and inbox
  5. On success:
    • Update state from the returned value
    • Append a timeline entry (transition operation, starting state, messages processed, returned result, start/end timestamps)
    • Clear the inbox
    • Set status to SLEEPING
    • Write the agent record
  6. On error:
    • Leave state and inbox unchanged
    • Set status to SUSPENDED, set error
    • Write the agent record

The run loop writes twice: once to mark running (step 3), once to record the outcome (step 5 or 6). Each write is a complete, atomic agent record replacement.

On error, the inbox is preserved — the same messages are available for retry after the error is resolved and the agent is resumed.

Timeline Entry

Each entry in the timeline records one successful run:

FieldTypeDescription
startlongTimestamp when the run started
endlongTimestamp when the run completed
opstringThe operation reference used for the transition function
stateanyThe starting state passed to the transition function
messagesvectorThe inbox messages passed to the transition function
resultanyThe result returned by the transition function

The output state is not stored in the timeline entry — it is the state field in the agent record (for the latest run) or the state field in the next timeline entry (for earlier runs). This avoids redundant storage.

Timeline entries are only written on success. On error, no timeline entry is created.

Transition Function Contract

The transition function is a standard Grid Operation with the following contract:

Input:

FieldTypeDescription
agent-idstringThe agent's identifier
stateanyCurrent user-defined state from the agent record. Null on first run.
messagesvectorThe inbox messages to process

Output:

FieldTypeDescription
stateanyUpdated user-defined state. Written back to the agent record.
resultanySummary of the transition outcome. Recorded in the timeline.

The transition function does not manage timestamps, status, timeline, or inbox — it is a pure function from (state, messages) to (state, result).

The transition function MUST handle its own errors internally. If it throws an unhandled exception, the framework treats this as a severe failure: the agent is suspended, the inbox is preserved, and the error is recorded. No timeline entry is written.

Three-Level Architecture

The agent system separates concerns into three levels. Each level is a Grid operation, invokable locally, remotely, or via orchestration:

Level 1: Agent Update          (framework — manages run loop)
│ reads inbox, invokes level 2, writes timeline and status


Level 2: Agent Transition (domain logic — manages state)
│ processes messages, maintains conversation/workflow state
│ invokes level 3 for external calls


Level 3: External Call (single step — stateless)
makes one external request (LLM API, HTTP, etc.)
returns structured response

Level 1 is the same for every agent — it is the run loop defined above. It owns the agent record and invokes level 2 as a Grid operation.

Level 2 is the pluggable part. Different agents use different transition functions: an LLM-backed conversation agent, a rule engine, a workflow coordinator, or custom logic. Level 2 receives current state and messages, returns updated state and a result summary.

Level 3 is a standard Grid operation that makes a single external call. For LLM agents, this is an LLM inference operation. Level 3 knows about API serialisation, authentication, and provider-specific details. It does not know about agents, conversation history, or the run loop.

The level 2 operation is specified by the caller when triggering the run loop. The level 3 operation is specified by the agent creator in the agent's initial state configuration.

Credential Access

Operations that need API keys or other secrets resolve them from two sources, in priority order:

  1. User's secret store — an encrypted per-user credential store, using the secret name declared in the operation's metadata
  2. Input parameter — an optional plaintext field for testing only

The agent's configuration does not contain API keys. The operation metadata declares which secret it needs, and the runtime resolves it from the caller's secret store. This keeps agent configuration clean and credentials in encrypted storage.

Relationship to Jobs

The agent lifecycle defined here operates within the job lifecycle defined in COG-8. Specifically:

  • An agent is created as part of a job invocation
  • The agent's run loop is triggered by job operations
  • Agent status (SLEEPING, RUNNING, SUSPENDED, TERMINATED) describes the agent's internal state, while job status (PENDING, STARTED, INPUT_REQUIRED, COMPLETE, etc.) describes the job's external state as seen by clients
  • Messages delivered via COG-9 endpoints reach the agent's inbox through the job message queue

The mapping between agent status and job status is implementation-defined. A typical mapping would be:

Agent StatusJob StatusMeaning
SLEEPINGINPUT_REQUIREDAgent is idle, awaiting next message
RUNNINGSTARTEDAgent is processing messages
SUSPENDEDFAILED or PAUSEDAgent encountered an error
TERMINATEDCOMPLETEAgent has finished

Merge Semantics

Agent records use LWW merge semantics at the lattice level. The record with the later ts wins unconditionally.

This is correct because:

  • Monotonic timestamps: All writes to an agent are serialised on a single venue, so ts is monotonically increasing
  • Atomic snapshots: The winner is always a complete, consistent state that genuinely existed — never a mixture of fields from different writes
  • Simple replication: Cross-venue sync is replication. The hosting venue always has the latest ts. Replicas receive the complete state.

Security Considerations

Resource Limits

Venues SHOULD impose limits on:

  • Agent count per user — to prevent resource exhaustion
  • Inbox size — to prevent memory exhaustion from rapid message delivery
  • Timeline size — to bound the storage cost of long-running agents
  • Agent lifetime — to prevent indefinitely dormant agents from consuming resources

Credential Security

Agent state and timeline entries may indirectly reference sensitive information. Venues SHOULD:

  • Never store plaintext credentials in the agent record
  • Use an encrypted per-user secret store for API keys and other credentials
  • Redact sensitive fields in timeline entries where possible

Message Validation

The transition function receives arbitrary messages from the inbox. Implementations SHOULD:

  • Validate message format before processing
  • Handle malformed or malicious messages gracefully (return an error state rather than crashing)
  • Apply rate limits on message delivery per agent

Transition Function Safety

A failing transition function suspends the agent, preserving the inbox for retry. This design prevents message loss but means a pathological transition function could be repeatedly retried. Venues SHOULD:

  • Track consecutive failures and escalate (e.g. terminate after N failures)
  • Apply timeouts to individual transition function invocations
  • Log transition function errors for operator review

Agent Workspace

Agents have access to persistent, user-scoped storage namespaces via the covia adapter operations. These are standard default tools available to all LLM-backed agents.

Namespaces

  • /w/ (workspace) — general-purpose data storage for agent knowledge, state, logs, and working data
  • /o/ (operations) — user-defined operation definitions
  • /h/ (HITL) — reserved for human-in-the-loop requests (Phase D)

All namespaces are readable via covia:read, covia:list, and covia:slice. The /w/ and /o/ namespaces are writable via covia:write, covia:delete, and covia:append. Other namespaces (/g/, /s/, /j/) are framework-managed and reject direct writes.

Operations

OperationDescription
covia:readRead a value at any lattice path. Supports maxSize guard.
covia:writeWrite a value to /w/ or /o/ at any depth. Creates intermediate maps.
covia:deleteDelete a key from /w/ or /o/ at any depth.
covia:appendAppend an element to a vector in /w/ or /o/. Creates vector if absent.
covia:sliceRead a paginated slice from a collection (vector, map, or set).
covia:listDescribe structure at a path: type, count, keys for maps.

Paths support mixed map/vector navigation — e.g. w/records/1/name navigates into a map, then a vector by index, then a map field.

See COG-4: Grid Lattice for the per-user namespace structure and merge semantics.

Future Directions

The following capabilities are anticipated but not yet specified:

  • Cross-user workspace reads — agents reading another user's /w/ namespace, gated by capability delegation (see COG-13: Agent Capabilities)
  • HITL requests — the /h/ namespace for human-in-the-loop interaction patterns
  • Cross-user messaging — agents sending messages to agents owned by different users
  • Agent forking — creating a copy of an agent's state for branching conversations or experiments
  • Cross-venue migration — transferring an agent's state from one venue to another
  • Wake triggers — automatic run loop invocation on message delivery, timer events, or external signals
  • Recovery — handling agents left in RUNNING status after a venue restart