COG-6: Artifacts

Status:      Draft (Work in Progress)
Version:     0.1
Created:     2025-01-23
Authors:     Mike Anderson

Work in Progress

This specification is under active development. Structure and details may change significantly based on implementation experience and community feedback.

This standard specifies Artifacts - immutable data assets on the Covia Grid that can be verified, replicated, and shared across venues with cryptographic guarantees.

Purpose

Artifacts are the data foundation of the Grid, representing immutable content that can be:

Verified: Content integrity is guaranteed through cryptographic hashes
Replicated: Any venue can host an exact copy with provable authenticity
Trusted: Consumers can verify they have the exact data the creator intended

Unlike mutable data stores, artifacts provide absolute guarantees about content - if you have an artifact with a given ID, you know with cryptographic certainty that it contains exactly the data specified by the creator.

Terminology

See COG-1: Architecture for definitions of Grid terminology.

Core Principles

Immutability

Artifacts are permanently immutable. Once created, an artifact's content and metadata cannot be changed.

This immutability is enforced cryptographically:

The Asset ID is the Value ID of the metadata (see COG-5)
The metadata contains the Content Hash (SHA256 of the content)
Any change to content or metadata produces a different Asset ID

Immutability enables:

Caching: Artifacts can be cached indefinitely without invalidation concerns
Replication: Copies are guaranteed identical across venues
Auditing: Historical references remain valid forever
Trust: Content cannot be tampered with after creation

Verifiability

Every artifact can be independently verified by any party:

Metadata Verification: Compute the Value ID of the metadata and compare to the Asset ID (see COG-5)
Content Verification: Compute SHA256 of the content bytes and compare to content.sha256

This verification chain forms a Merkle structure — trusting the Asset ID means trusting both the metadata and the content it references.

Asset ID (trusted)
    │
    └── Value ID of Metadata
            │
            └── content.sha256 ──► SHA256 of Content Bytes

Federated Trust

Artifacts can be safely replicated across venues because:

Content-addressing ensures the same Asset ID always refers to identical data
Independent verification means venues don't need to trust each other
No coordination required - venues can replicate artifacts without permission from the original source

This enables a trustless federation model where:

Users can obtain artifacts from any venue hosting them
Verification ensures authenticity regardless of source
Load can be distributed across multiple venues
Offline or unavailable venues don't prevent access to replicated artifacts

Specification

Artifact Identification

Artifacts are identified following COG-2: Decentralised ID:

did:web:venue.example.com/a/119e30db8a4ea8b33723603743591a5f8229684e6236d89ef1966a72d7293607

The path component /a/{asset-id} identifies the artifact within the venue.

Required Metadata Fields

Artifacts MUST include a content object in their metadata. See COG-5: Asset Metadata for the complete metadata specification.

`content.sha256` (REQUIRED)

The SHA256 hash of the content bytes, encoded as a hexadecimal string.

{
  "content": {
    "sha256": "119E30DB8A4EA8B33723603743591A5F8229684E6236D89EF1966A72D7293607"
  }
}

Implementations MUST verify that content matches this hash before serving or using it.

`content.contentType` (RECOMMENDED)

The MIME type of the content, enabling correct interpretation.

{
  "content": {
    "contentType": "text/csv"
  }
}

Content Storage

Artifact content is stored separately from metadata. Venues MUST:

Store content addressable by its SHA256 hash
Return content only when the hash can be verified
Support retrieval of content given an Asset ID

Content MAY be:

Stored locally on the venue
Retrieved on-demand from other venues
Cached based on access patterns

Replication

Venues MAY replicate artifacts from other venues. When replicating:

Obtain the metadata from the source venue
Verify metadata hash matches the Asset ID
Obtain the content
Verify content hash matches content.sha256
Store both metadata and content locally

Implementations MUST NOT serve artifacts that fail verification.

Versioning

Since artifacts are immutable, versioning is achieved through:

Creating new artifacts with updated content
Using metadata fields to link versions (e.g., previousVersion, replaces)
Maintaining collections or indices that track version history

Each version is a distinct artifact with its own Asset ID.

Examples

Dataset Artifact

A machine learning dataset with full provenance:

{
  "name": "Iris Dataset",
  "description": "The classic Iris flower dataset containing 150 samples of iris flowers with 4 features each (sepal length, sepal width, petal length, petal width) and species classification.",
  "creator": "UCI Machine Learning Repository",
  "dateCreated": "2025-06-05T06:53:59Z",
  "license": {
    "name": "CC BY 4.0",
    "url": "https://creativecommons.org/licenses/by/4.0/"
  },
  "keywords": ["machine learning", "iris", "dataset", "classification"],
  "content": {
    "contentType": "text/csv",
    "sha256": "119E30DB8A4EA8B33723603743591A5F8229684E6236D89EF1966A72D7293607",
    "encoding": "UTF-8",
    "inLanguage": "en"
  },
  "additionalInformation": {
    "rows": 150,
    "columns": 5,
    "source": "https://archive.ics.uci.edu/ml/datasets/iris"
  }
}

Document Artifact

A text document with licensing information:

{
  "name": "Hamlet",
  "description": "The complete text of Shakespeare's tragedy Hamlet, Prince of Denmark.",
  "creator": "William Shakespeare",
  "dateCreated": "2025-06-05T06:53:59Z",
  "license": {
    "name": "Public Domain"
  },
  "keywords": ["text", "drama", "shakespeare", "classic"],
  "content": {
    "contentType": "text/plain",
    "sha256": "74f16013e2b7ce83d5f5c8d4b3c42f279242f6ddfa7bab0f31320301e60c81d6",
    "encoding": "UTF-8",
    "inLanguage": "en-GB"
  }
}

Binary Artifact

A compiled model or binary file:

{
  "name": "Sentiment Analysis Model v2.1",
  "description": "Pre-trained transformer model for sentiment analysis, fine-tuned on product reviews.",
  "creator": "ML Research Team",
  "dateCreated": "2025-07-15T14:30:00Z",
  "keywords": ["model", "nlp", "sentiment", "transformer"],
  "content": {
    "contentType": "application/octet-stream",
    "sha256": "a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456"
  },
  "additionalInformation": {
    "framework": "pytorch",
    "version": "2.1",
    "accuracy": 0.94,
    "previousVersion": "did:web:models.example.com/a/0987654321fedcba..."
  }
}

Security Considerations

Hash Algorithm

SHA256 is currently required for all hashes. Future specifications may add support for additional algorithms with explicit algorithm identifiers.

Content Injection

Implementations MUST verify content hashes before processing. Malicious venues could attempt to serve incorrect content - verification prevents this attack.

Metadata Authenticity

The Asset ID verifies metadata integrity but not authenticity. To verify the creator's identity, additional mechanisms such as digital signatures or DID-based attestations may be used.

COG-1: Architecture - Overall Grid architecture and terminology
COG-2: Decentralised ID - Asset identification
COG-5: Asset Metadata - Common metadata format
COG-7: Operations - Executable assets (contrast with artifacts)

Purpose​

Terminology​

Core Principles​

Immutability​

Verifiability​

Federated Trust​

Specification​

Artifact Identification​

Required Metadata Fields​

content.sha256 (REQUIRED)​

content.contentType (RECOMMENDED)​

Content Storage​

Replication​

Versioning​

Examples​

Dataset Artifact​

Document Artifact​

Binary Artifact​

Security Considerations​

Hash Algorithm​

Content Injection​

Metadata Authenticity​

Related Specifications​