COG-1: Architecture

Status:      Draft 
Version:     1.0 
Created:     2024-08-06 
Authors:     Mike Anderson  

This standard specifies the overall design and architecture of the Covia Open Grid.

Purpose

The Grid enables secure and efficient collaboration between AI agents, data providers, and compute resources across organisational boundaries.

The Grid has been designed with several core principles:

Decentralisation : nodes in the Grid are independent and operate on a peer-to-peer basis, free from any centralised or external control.
Open protocol : There is a formally specified protocol available as an open standard that can be freely implemented, and has an open source reference implementation
Security : All interactions are secure by design, ensuring integrity, maintaining high availability and preventing unauthorised use of of resources on the Grid
Universality : The grid supports any type of resource - any data type, and sort of compute operation, any sort of interaction of software systems with the real world. It does this is a way that is fully pluggable and composable.
Scalability : The Grid has no limit to scalability - constraints are simply those of the underlying resources
Interoperability : The Grid connects to existing systems and resources of any type, so that existing infrastructure can be utilised
Technology neutrality : The Grid architecture does not make any assumptions about the technology choices made by specific implementations or underlying systems. This enables independence from vendor lock in and ability to continuously incorporate new advances without rework.

This specification defines the architectural principles and core components that make this possible.

Terminology

Term	Definition
Grid	The global network of venues that enables federated AI orchestration and access to data and compute resources of all forms
Venue	A node in the Grid that is independently governed, addressable and responds to requests from other nodes
Client	A software application or component that makes requests to the Grid.
Asset	A resource accessible and addressable on the Grid. An asset represent either compute or data resources
Operation	An type of asset which represents an executable function on the grid, which takes inputs and produces results in response to a request
Artifact	A type of asset which represents immutable data or content. This may be a file containing content is a specific format, or a snapshot of database etc.
Agent	An autonomous AI system connected to the grid. It may respond to grid requests or run grid requests itself
Job	A specific task submitted to the grid via a request to run an Operation. Jobs can be considered as asynchronous processes may fail or succeed.
DID	A decentralised identifier used to identify a venue, asset, job or client
Transport	A mechanism for conveying messages between nodes, e.g. REST API calls over HTTPS
Orchestration	An operation which is responsible for executing on or more child operations (which may in turn be lower level orchestrations)

Specification

Universal addressability

Entities on the Grid are designed to be universally addressable via global namespaces so that clients, venues, assets, and jobs can be uniquely identified and accessed across the network, regardless of their physical or organisational location. This is achieved through the use of Decentralised IDentifiers (DIDs), which provide a standardised, interoperable mechanism for referencing entities in a way that is independent of any specific venue or transport mechanism.

Universal addressability ensures that resources and operations on the Grid can be seamlessly discovered, requested, and utilised by authorised clients, supporting the principles of decentralization and interoperability. The architecture allows for flexible resolution of DIDs to specific endpoints, enabling dynamic routing of requests while maintaining the integrity and autonomy of each node in the Grid.

Grid addresses in general consist of two parts:

Venue DID - The decentralised identifier of the venue hosting the resource. The grid supports DID resolvers that can connect to the specified venue given an appropriate DID.
Entity ID - The ID of the specific resource within the venue

This design means that multiple venues may host equivalent copies of the same asset or operation. This capability is critical to allow for P2P replication of assets and other resources, since venues may need to obtain verifiable exact copies of artifacts in order to perform appropriate computation

Example:

did:web:open.covia.ai/a:c147dae085baa0124bd1678698ea91645811075fbf2f051ab4a330bec1b7f742

This specifies an artifact with the metadata hash c147dae085baa0124bd1678698ea91645811075fbf2f051ab4a330bec1b7f742 located at the venue identified by the hostname open.covia.ai.

The use of the metadata hash as an asset ID is a particularly important feature because it ensures that:

Metadata can be fully verified (only the exact metadata string of the artifact is expected to produce the same cryptographic hash)
If the metadata contains hashes of content, the content can also be verified (forming a Merkle tree)
Assets on the Grid are content-addressable

Grid network roles

The Grid is a network of nodes that exchanges message (requests and responses) through one or more transport mechanisms.

Venues

A venue is a Grid node that responds to requests, typically implemented as server process accessible to the Internet. The role of venues is to provide controlled access to resources to clients elsewhere on the grid, and ensure proper governance of protocol activity.

A venue SHOULD respond to any request it receives.

A venue MUST reject requests to access or utilise assets that the requestor is not authorised to use, e.g. private artifacts created by a different user.

A venue SHOULD obtain a DID to ensure that it is easily addressable by clients on the Grid. Without such addressability, clients will have to configure an appropriate connection and transport mechanism manually.

A venue MAY be a client of other venues on the Grid, e.g. when acting as an orchestrator of operations on behalf of one of its own clients.

A venue MUST provide metadata on any asset it serves to any authorised client.

A venue SHOULD create a job in response to any authorised request to run an operation. Exceptions might include rate limiting or dealing with resource constraints.

Clients

Any system or software able to send requests to the Grid is considered a client of the Grid.

Examples of clients include:

A web application used by a human such as https://app.covia.ai
An AI agent working on a set of long-running project tasks
Software written in a general purpose programming language such as Python, and using the Covia SDK

A client MUST connect to at least one venue with an appropriate transport mechanism in order to utilise the Grid

A client MAY make a local connection to itself. This is frequently useful testing purposes, or in cases where a venue needs to orchestrate multiple operations where a subset can be run locally.

A client MAY simultaneously connect to multiple venues

A client MUST provide appropriate credentials when necessary to access specific grid resources

Purpose​

Terminology​

Specification​

Universal addressability​

Grid network roles​

Venues​

Clients​

Grid Requests​