Skip to main content

COG-6: Artifacts

Status:      Draft (Work in Progress)
Version: 0.1
Created: 2025-01-23
Authors: Mike Anderson
Work in Progress

This specification is under active development. Structure and details may change significantly based on implementation experience and community feedback.

This standard specifies Artifacts - immutable data assets on the Covia Grid that can be verified, replicated, and shared across venues with cryptographic guarantees.

Purpose

Artifacts are the data foundation of the Grid, representing immutable content that can be:

  • Verified: Content integrity is guaranteed through cryptographic hashes
  • Replicated: Any venue can host an exact copy with provable authenticity
  • Trusted: Consumers can verify they have the exact data the creator intended

Unlike mutable data stores, artifacts provide absolute guarantees about content - if you have an artifact with a given ID, you know with cryptographic certainty that it contains exactly the data specified by the creator.

Terminology

See COG-1: Architecture for definitions of Grid terminology.

Core Principles

Immutability

Artifacts are permanently immutable. Once created, an artifact's content and metadata cannot be changed.

This immutability is enforced cryptographically:

  • The Asset ID is the SHA256 hash of the metadata
  • The metadata contains the Content Hash (SHA256 of the content)
  • Any change to content or metadata produces a different Asset ID

Immutability enables:

  • Caching: Artifacts can be cached indefinitely without invalidation concerns
  • Replication: Copies are guaranteed identical across venues
  • Auditing: Historical references remain valid forever
  • Trust: Content cannot be tampered with after creation

Verifiability

Every artifact can be independently verified by any party:

  1. Metadata Verification: Compute SHA256 of the metadata string and compare to the Asset ID
  2. Content Verification: Compute SHA256 of the content bytes and compare to content.sha256

This verification chain forms a Merkle structure - trusting the Asset ID means trusting both the metadata and the content it references.

Asset ID (trusted)

└── SHA256 of Metadata

└── content.sha256 ──► SHA256 of Content Bytes

Federated Trust

Artifacts can be safely replicated across venues because:

  • Content-addressing ensures the same Asset ID always refers to identical data
  • Independent verification means venues don't need to trust each other
  • No coordination required - venues can replicate artifacts without permission from the original source

This enables a trustless federation model where:

  • Users can obtain artifacts from any venue hosting them
  • Verification ensures authenticity regardless of source
  • Load can be distributed across multiple venues
  • Offline or unavailable venues don't prevent access to replicated artifacts

Specification

Artifact Identification

Artifacts are identified following COG-2: Decentralised ID:

did:web:venue.example.com/a/119e30db8a4ea8b33723603743591a5f8229684e6236d89ef1966a72d7293607

The path component /a/{asset-id} identifies the artifact within the venue.

Required Metadata Fields

Artifacts MUST include a content object in their metadata. See COG-5: Asset Metadata for the complete metadata specification.

content.sha256 (REQUIRED)

The SHA256 hash of the content bytes, encoded as a hexadecimal string.

{
"content": {
"sha256": "119E30DB8A4EA8B33723603743591A5F8229684E6236D89EF1966A72D7293607"
}
}

Implementations MUST verify that content matches this hash before serving or using it.

The MIME type of the content, enabling correct interpretation.

{
"content": {
"contentType": "text/csv"
}
}

Content Storage

Artifact content is stored separately from metadata. Venues MUST:

  • Store content addressable by its SHA256 hash
  • Return content only when the hash can be verified
  • Support retrieval of content given an Asset ID

Content MAY be:

  • Stored locally on the venue
  • Retrieved on-demand from other venues
  • Cached based on access patterns

Replication

Venues MAY replicate artifacts from other venues. When replicating:

  1. Obtain the metadata from the source venue
  2. Verify metadata hash matches the Asset ID
  3. Obtain the content
  4. Verify content hash matches content.sha256
  5. Store both metadata and content locally

Implementations MUST NOT serve artifacts that fail verification.

Versioning

Since artifacts are immutable, versioning is achieved through:

  • Creating new artifacts with updated content
  • Using metadata fields to link versions (e.g., previousVersion, replaces)
  • Maintaining collections or indices that track version history

Each version is a distinct artifact with its own Asset ID.

Examples

Dataset Artifact

A machine learning dataset with full provenance:

{
"name": "Iris Dataset",
"description": "The classic Iris flower dataset containing 150 samples of iris flowers with 4 features each (sepal length, sepal width, petal length, petal width) and species classification.",
"creator": "UCI Machine Learning Repository",
"dateCreated": "2025-06-05T06:53:59Z",
"license": {
"name": "CC BY 4.0",
"url": "https://creativecommons.org/licenses/by/4.0/"
},
"keywords": ["machine learning", "iris", "dataset", "classification"],
"content": {
"contentType": "text/csv",
"sha256": "119E30DB8A4EA8B33723603743591A5F8229684E6236D89EF1966A72D7293607",
"encoding": "UTF-8",
"inLanguage": "en"
},
"additionalInformation": {
"rows": 150,
"columns": 5,
"source": "https://archive.ics.uci.edu/ml/datasets/iris"
}
}

Document Artifact

A text document with licensing information:

{
"name": "Hamlet",
"description": "The complete text of Shakespeare's tragedy Hamlet, Prince of Denmark.",
"creator": "William Shakespeare",
"dateCreated": "2025-06-05T06:53:59Z",
"license": {
"name": "Public Domain"
},
"keywords": ["text", "drama", "shakespeare", "classic"],
"content": {
"contentType": "text/plain",
"sha256": "74f16013e2b7ce83d5f5c8d4b3c42f279242f6ddfa7bab0f31320301e60c81d6",
"encoding": "UTF-8",
"inLanguage": "en-GB"
}
}

Binary Artifact

A compiled model or binary file:

{
"name": "Sentiment Analysis Model v2.1",
"description": "Pre-trained transformer model for sentiment analysis, fine-tuned on product reviews.",
"creator": "ML Research Team",
"dateCreated": "2025-07-15T14:30:00Z",
"keywords": ["model", "nlp", "sentiment", "transformer"],
"content": {
"contentType": "application/octet-stream",
"sha256": "a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456"
},
"additionalInformation": {
"framework": "pytorch",
"version": "2.1",
"accuracy": 0.94,
"previousVersion": "did:web:models.example.com/a/0987654321fedcba..."
}
}

Security Considerations

Hash Algorithm

SHA256 is currently required for all hashes. Future specifications may add support for additional algorithms with explicit algorithm identifiers.

Content Injection

Implementations MUST verify content hashes before processing. Malicious venues could attempt to serve incorrect content - verification prevents this attack.

Metadata Authenticity

The Asset ID verifies metadata integrity but not authenticity. To verify the creator's identity, additional mechanisms such as digital signatures or DID-based attestations may be used.