COG-5: Asset Metadata

Status:      Draft (Work in Progress)
Version:     0.2
Created:     2025-01-23
Updated:     2026-02-27
Authors:     Mike Anderson

Work in Progress

This specification is under active development. Structure and details may change significantly based on implementation experience and community feedback.

This standard specifies the metadata format for assets on the Covia Grid, enabling interoperability, discoverability, and verification of resources across the distributed network.

Purpose

Assets are the fundamental resources of the Covia Grid, representing data, operations, and other computational resources that can be shared and utilised across venues. A well-defined metadata format is essential for:

Discovery: Enabling clients to find relevant assets based on descriptive information
Verification: Ensuring content integrity through cryptographic hashes
Interoperability: Allowing assets to be understood and processed by any Grid participant
Provenance: Tracking the origin, authorship, and licensing of resources

This specification defines the JSON-based metadata format that describes assets on the Grid.

Terminology

See COG-1: Architecture for definitions of Grid terminology including Asset, Artifact, Operation, and Job.

Specification

Metadata Format

Asset metadata MUST be a valid JSON object.

Asset ID Computation

Asset IDs use lattice Value IDs — the SHA3-256 hash of the canonical binary encoding defined by the Convex lattice (CAD003). The computation is:

Parse the metadata JSON string into a lattice data structure (an ordered map of string keys to values)
The Asset ID is the Value ID of that data structure — the SHA3-256 hash of its canonical CAD003 binary encoding
Encode the hash as a lowercase hexadecimal string (64 characters)

This scheme provides:

Semantic hashing: The ID is derived from the data structure's content, not a particular JSON serialisation. Two semantically identical metadata objects with different formatting (key order, whitespace) produce the same ID.
Native lattice integration: The lattice automatically computes Value IDs for all stored data, so asset identification is a zero-cost property of storage.
Structural sharing: The lattice's Merkle tree structure enables automatic deduplication of shared sub-structures across assets.
Unified data model: Assets, job state records, and all lattice data use the same identification scheme.

Client Verification

Clients wishing to verify an Asset ID locally need a CAD003 encoding implementation rather than just a general-purpose hash function. Clients that do not require local verification can rely on a trusted venue (or multiple venues for cross-checking) as a verification endpoint — the venue's GET /api/v1/assets/{id} response implicitly confirms the ID-to-content binding.

Common Fields

The following fields are RECOMMENDED for all assets:

`name` (string)

A human-readable name for the asset.

{
  "name": "Iris Dataset"
}

`description` (string)

A detailed description of the asset, its purpose, and how it can be used.

{
  "description": "The famous Iris flower dataset for machine learning classification tasks"
}

`creator` (string)

The creator or author of the asset.

{
  "creator": "UCI Machine Learning Repository"
}

`dateCreated` (string)

The creation date in ISO 8601 format.

{
  "dateCreated": "2025-06-05T06:53:59Z"
}

`dateModified` (string)

The last modification date in ISO 8601 format.

{
  "dateModified": "2025-06-05T07:22:59Z"
}

`keywords` (array of strings)

Keywords for discovery and categorisation.

{
  "keywords": ["machine learning", "dataset", "classification"]
}

`license` (object)

Licensing information for the asset.

{
  "license": {
    "name": "CC BY 4.0",
    "url": "https://creativecommons.org/licenses/by/4.0/"
  }
}

Asset Type Fields

Assets are categorised by the presence of specific top-level objects:

Artifacts

Assets with a content object represent Artifacts - immutable data assets.

See COG-6: Artifacts for the complete specification including:

Content hash verification
Replication and federation
Content storage requirements

Operations

Assets with an operation object represent Operations - executable assets.

See COG-7: Operations for the complete specification including:

Adapter configuration
Input/output schemas
Orchestration workflows

Additional Information

The additionalInformation object MAY contain any implementation-specific or domain-specific metadata:

{
  "additionalInformation": {
    "notes": ["Uploaded for testing purposes"],
    "sourceUrl": "https://example.com/original"
  }
}

Validation

Implementations SHOULD validate metadata against this specification before accepting assets.

Implementations MUST reject metadata that:

Is not valid JSON
Contains content hashes that do not match the actual content (for artifacts)
References non-existent adapters (for operations)

Security Considerations

Metadata Immutability

Once an asset is created, its metadata string and Asset ID are immutable. Any modification to metadata results in a new Asset ID. Implementations MUST NOT allow modification of existing metadata.

See COG-6: Artifacts and COG-7: Operations for asset-type-specific security considerations.

COG-1: Architecture - Overall Grid architecture
COG-2: Decentralised ID - Asset identification
COG-4: Grid Lattice - Asset storage in the lattice
COG-6: Artifacts - Immutable data assets
COG-7: Operations - Executable assets

Purpose​

Terminology​

Specification​

Metadata Format​

Asset ID Computation​

Common Fields​

name (string)​

description (string)​

creator (string)​

dateCreated (string)​

dateModified (string)​

keywords (array of strings)​

license (object)​

Asset Type Fields​

Artifacts​

Operations​

Additional Information​

Validation​

Security Considerations​

Metadata Immutability​

Related Specifications​