# Veritas Protocol

## A Working Paper on a Collaborative, Domain-Indexed Fact-Checking Substrate for the Open Web

**Collaborative Fact-Checking Working Group — Draft v0.1**

*Circulated for review · April 2026*

---

## Abstract

This working paper proposes **Veritas Protocol**: an open, federated infrastructure for structured factual claims on the web. The protocol combines four properties that together are not currently provided by any single system: (i) cryptographic per-claim provenance linking each statement to its sources, evidence, and human reviewers; (ii) *domain-indexed verdicts*, allowing the same claim to be verified in one named consensus frame and disputed in another without collapsing the distinction; (iii) cascading falsification, in which the retraction of an upstream source propagates to all dependent claims and is pushed to every subscribed consumer; and (iv) an open, read-only application programming interface suitable for grounding generative artificial intelligence systems at inference time. The protocol composes mature open standards — W3C Verifiable Credentials and Decentralized Identifiers, IETF Supply-Chain Integrity Transparency and Trust (SCITT), RFC 9162 Certificate Transparency, Sigstore-style signing, libp2p gossip transport, and schema.org ClaimReview — rather than inventing a new cryptoeconomic substrate. A comparative analysis of four implementation choices concludes that a federated, token-free architecture delivers every claimed property at approximately one order of magnitude lower infrastructure cost and substantially lower regulatory exposure than a blockchain-based alternative. The paper concludes with a phased implementation roadmap, a governance proposal, a preliminary economic model, and a call for research partners.

**Keywords:** fact-checking, content provenance, verifiable credentials, decentralized identifiers, SCITT, transparency logs, AI grounding, hallucination reduction, consensus domains, epistemic infrastructure.

---

## Executive summary

1. **The open web lacks a trust layer for factual claims.** It has standards for addressing, transport, security, structure, and privacy, but no standard for *what this site asserts and on what basis*. The cost of this absence is paid in reader confusion, institutional burden of correction, and — increasingly — in the economic overhead of generative AI systems that must filter their own unreliable outputs.

2. **Four properties, together, would constitute such a layer.** Provenance; plural verdicts; cascading falsification; open AI-read surface. Each component exists individually in existing work; the combination does not.

3. **The composition is achievable on mature standards today.** As of 2026, the necessary primitives — W3C Verifiable Credentials Data Model 2.0, W3C Decentralized Identifiers v1.1, IETF SCITT drafts, RFC 9162 Certificate Transparency 2.0, Sigstore's transparency pattern, libp2p GossipSub, and schema.org ClaimReview — are sufficiently mature to be composed. Three years ago several pieces were drafts; they are now implementable.

4. **A federated architecture is preferable to a blockchain-based one.** Comparative analysis across cost, regulatory exposure, time to first demonstrable signal, prior-art track record, and ecosystem alignment finds federation strictly preferable in every category except one narrow scenario (a trust-minimised international consortium with no mutually-acceptable fiduciary), which is not the plausible founding scenario for this work.

5. **The hardest problem is governance, not technology.** Managing the acceptance, revision, and rejection of *consensus domains* — the named epistemic frames under which verdicts are indexed — is the central risk, the central intellectual contribution, and the central policy question. This paper proposes a governance model and identifies unresolved questions for the research community.

6. **A three-phase roadmap is proposed.** Phase I (months 0–6): specification plus reference implementation. Phase II (months 6–18): federated pilot network with 5–10 institutional validators and one AI-lab integration. Phase III (months 18+): sustained operation under a foundation host. Exit conditions at each phase gate are defined.

7. **Research partners are invited.** Universities, libraries, standards bodies, fact-checking organisations, AI laboratories, and foundations with alignment to open-infrastructure investment are invited to co-author the specification, operate pilot validators, charter consensus domains, or audit the protocol.

---

## 1 Introduction

### 1.1 The infrastructure framing

Every substantial addition to the open web has been, in retrospect, an infrastructure question. DNS did not invent hostnames; it standardised their resolution. TLS did not invent encryption; it standardised its use. Certificate Transparency (RFC 6962 [1], updated by RFC 9162 [2]) did not invent signed certificates; it standardised their public, tamper-evident publication. HTML, HTTP, schema.org, Atom, robots.txt, WebAuthn — each is a thin standard that, once adopted, makes a previously implicit question explicit and machine-readable.

This paper argues that the *factual content* of the web now needs a comparable standard. The question "what does this page claim, and on what basis?" is today answered only informally — by the reader, by habit, and increasingly by generative AI systems making their own best guesses from unstructured text. The cost of this informality is rising. The proposal here is to make the question machine-readable.

### 1.2 What this paper does

Section 2 reviews existing work across four layers of prior art. Section 3 states the problem. Sections 4 and 5 propose the protocol's design principles and architecture. Section 6 compares four implementation mechanisms and justifies the recommendation. Sections 7–10 address governance, economics, legal and regulatory considerations, and risks. Section 11 returns to the comparison with existing systems. Section 12 enumerates open research questions. Section 13 presents the implementation roadmap. Section 14 is the call for partners.

### 1.3 What this paper does not do

This is a working paper, not a specification. Wire formats, canonicalisation algorithms, threat models, and protocol state machines are sketched; they are not fully specified. Such specification is the work of Phase I of the proposed roadmap.

The paper does not assume a particular policy position on the nature of truth. It is compatible with epistemic pluralism but not with metaphysical relativism (section 7.3). It does not propose to adjudicate contested public facts; it proposes infrastructure under which existing institutions can do so transparently.

---

## 2 Background

### 2.1 Four layers of prior art

Existing work relevant to the Veritas Protocol falls into four layers. Each layer is mature; none on its own provides the full set of properties in section 4.

**(a) Content provenance.** The Coalition for Content Provenance and Authenticity (C2PA) has published a specification and an active ecosystem for cryptographically signed assertions about how media assets were produced and edited [3]. The Content Authenticity Initiative (CAI), led by Adobe with approximately 3,000 affiliated organisations as of early 2026, implements the specification at the tooling and adoption level [4]. C2PA signs *media assets*; it does not address granular factual statements in text, nor does it express retraction propagation or domain-relative verdicts.

**(b) Fact-check interchange.** schema.org/ClaimReview [5] is a JSON-LD schema for individual fact-check records, used by PolitiFact, Snopes, FactCheck.org, and signatories of the International Fact-Checking Network (IFCN) [6]. Duke University Reporters' Lab maintains a global registry of ClaimReview records and operates the Google Fact Check Explorer [7]. ClaimReview is a flat record: one claim, one verdict, one reviewer. It does not represent dependency relationships between claims, domain-indexed verdicts, or cascading updates.

**(c) Decentralised identity and signed statements.** The W3C Verifiable Credentials Data Model 2.0 [8] and Decentralized Identifiers (DIDs) v1.1 [9] are W3C Recommendations for identity and signed assertion. The IETF SCITT working group [10] is developing a standard for supply-chain-style transparency logs of signed statements; the working group has produced substantial drafts suitable as the transparency-log layer for factual-claim attestations. Sigstore [11] — comprising Fulcio (certificate authority for short-lived certs), Rekor (transparency log), and Cosign (signing tooling) — demonstrates the federation-plus-transparency pattern in production for software-supply-chain signing, with the same pattern applicable to claim signing.

**(d) Crypto-incentivised verification.** A cluster of projects between approximately 2017 and 2023 — Civil, Po.et, Fact Protocol, Bitpress, Swarm Network, and the academic prototypes DEFC [12], ProBlock [13], and Torky's Proof-of-Credibility [14] — attempted to incorporate tokenomic incentives into fact-checking workflows. Section 11.2 discusses the empirical record of these attempts.

### 2.2 Crowdsourced and platform-native systems

Alongside the infrastructure work above, platform-native systems have demonstrated operational throughput at scale. X Community Notes (formerly Birdwatch) uses a *bridging algorithm* that upweights contributor notes across historically-opposed voter groups; a 2024 *PNAS* study [15] provides independent evaluation of its effectiveness. Community Notes operates without tokens, without blockchain, and without explicit validator credentialing. Wikipedia and Wikidata provide the largest openly-edited claim corpus in existence, with mature governance for dispute resolution. These systems demonstrate that the scaling problem is tractable within existing mechanism families; they do not, however, provide inter-system claim identity, cascading falsification, or domain-indexed verdicts.

### 2.3 AI-grounding infrastructure

Retrieval-augmented generation (RAG) is the de facto technique for reducing hallucination in contemporary large language models. RAG systems are typically proprietary to the model provider and indexed over provider-curated corpora. No open, canonical, verdict-carrying claim substrate currently exists for AI systems to query. The Veritas Protocol's AI-read API is an attempt to provide one.

---

## 3 Problem statement

### 3.1 The provenance gap

A URL is not a citation; a citation without a verifier is not evidence. Claims on the web travel, in general, without traceable lineage. When a reader encounters the statement "Bolivia has 10 million hectares of degraded land" they have, at best, an author-provided link to a source. They typically do not have: the identity of a third party who has examined the source; a cryptographic signature attesting that the linked source has not been altered since examination; the dependency graph linking this claim to any upstream claims it rests on; or a mechanism by which, if the upstream source is retracted, this claim would be marked as at risk.

### 3.2 The retraction propagation gap

When a scientific paper is retracted, the Retraction Watch database [16] records the retraction, and Crossref provides an open retraction feed [17]. In almost no case does this signal reach the reader of a newspaper article, blog post, or textbook passage that cited the paper in question. Retractions, even verified ones, reach a tiny fraction of the audience exposed to the original claim. This is a correctable infrastructure problem.

### 3.3 The pluralism gap

Scientific consensus, national-curricular memory, legal jurisprudence in a specific jurisdiction, religious tradition, and active academic revisionism are not the same kind of object. The same factual sentence can be supported in one of these frames and disputed in another. Existing fact-check systems present a single verdict per claim; when multiple verdicts are actually called for, the systems either pick one (inheriting a political choice) or decline to rate (producing no signal). Neither behaviour scales to policy-relevant or historically-contested questions.

### 3.4 The AI-grounding gap

Contemporary generative systems produce text at rates many orders of magnitude higher than human review can verify. Post-hoc detection of generated content is not reliable. Pre-emit grounding against a trusted corpus measurably reduces confidently-wrong outputs in factual tasks; the Vectara Hallucination Leaderboard [18] and the HalluLens benchmark [19] both demonstrate substantial variance in hallucination rate across comparable systems, with retrieval-grounded systems generally outperforming ungrounded baselines. The grounding corpus that currently performs this role is always proprietary to the generating organisation. An open, signed, domain-scoped claim substrate would lower the marginal cost of hallucination reduction for every adopter.

### 3.5 The systemic cost

These gaps are paid for in recognisable units. Readers pay in attention and in misplaced trust. Institutions pay in reach-limited corrections. AI operators pay in compute spent on self-correction, retrieval filtering, and content moderation. The hypothesis of this paper is not that Veritas Protocol eliminates these costs, but that making the underlying data structure explicit is a precondition for reducing them, and that the total cost reduction would substantially exceed the protocol's operating cost.

---

## 4 Design principles

The protocol is designed under five principles.

**Principle 1 — Composition over invention.** Every primitive the protocol needs exists in a mature open standard. The protocol standardises the composition of those primitives, not new primitives.

**Principle 2 — Federation over centralisation.** No single entity has editorial authority. Validators — institutions, not individuals — are credentialed into named consensus domains by a governance body, and their signed attestations are public.

**Principle 3 — Plurality without relativism.** The protocol records verdicts under named epistemic frames. It does not adjudicate which frame is correct. It *does* require that admitted frames meet an editorial-standard admissibility criterion; it is not a substrate for arbitrary claims labelled as "a different kind of truth."

**Principle 4 — Falsifiability of claims about the protocol itself.** Impact claims — on hallucination reduction, on correction reach, on validator throughput — are operationalised as measurable benchmarks. Claims the protocol makes about itself are testable.

**Principle 5 — Graceful exit.** The protocol is a standard, not an operating platform. Exits at every phase are cheap; downstream adopters inherit the specification.

---

## 5 Architecture

### 5.1 The four properties

A conformant Veritas implementation provides the following four capabilities:

1. **Provenance graph.** Every claim is identified by a cryptographic hash over its canonical form; each claim record references upstream claims, primary-source evidence, the method of derivation, and the validators who have inspected it.

2. **Domain-indexed verdicts.** Each verdict on a claim is attributable to a named consensus domain (for example `scientific-default`, `legal-jurisdiction-eu`, `journalism-default`, `historical-academic-default`). The protocol stores verdicts from multiple domains without collapsing them.

3. **Cascading falsification.** A retraction event — a signed, quorum-attested attestation that an upstream claim is withdrawn — triggers re-evaluation of all dependent claims in the graph. Dependents are marked `CASCADE_PENDING`; publishers can confirm or dispute.

4. **AI-read surface.** A read-only REST API and corresponding materialised snapshots permit AI systems, browser extensions, and downstream tools to query claim status at low latency, optionally parameterised by consensus domain and minimum validator coverage.

### 5.2 Data model

**Claim records** are canonicalised JSON-LD documents extending schema.org/ClaimReview. Required fields include the claim text, a content-addressed claim hash, a stable claim-group identifier (for editorial revisions), declared sources and evidence pointers, claim type (`FACT`, `OPINION`, `PROJECTION`, `ARGUMENT`), and a language identifier. Canonicalisation follows RDF Dataset Canonicalization (URDNA2015) [20].

**Attestation records** are signed statements over claim hashes. Required fields include the target claim hash, the issuing validator's decentralized identifier, the consensus domain under which the attestation is issued, the verdict (`VERIFIED`, `SUPPORTED`, `QUALIFIED`, `UNVERIFIED`, `DISPUTED`, `RETRACTED`), a free-text rationale, the date of issuance, and a JSON Web Signature [21] over the canonical bytes. Attestations are logged in an IETF SCITT transparency log and optionally mirrored via Sigstore-style append-only logs.

**Cascade events** record the propagation of a retraction. Required fields include the trigger retraction attestation, the set of dependent claim hashes identified, the timestamp, and the propagation-quorum evidence (signed statements from *K* independent validators with minimum reputation).

### 5.3 Validator network

Validators are credentialed institutions, not individuals. A credential is a W3C Verifiable Credential [8] of type `VeritasValidator`, issued by the governance body. The credential records the validator's DID [9], the set of consensus domains in which the validator is credentialed, and a revocation pointer conforming to the W3C Status List specification [22]. Revocation follows the patterns established for Certificate Transparency [2] and Sigstore [11]: short-lived credentials, a public revocation feed, and a transparency log of issuance events.

### 5.4 Consensus domains

A consensus domain is a *named editorial frame*. Each domain is chartered by a rapporteur body — typically an existing institution (for example the Royal Society for `scientific-default`, the Duke Reporters' Lab for `journalism-default`, an EU judicial council for `legal-jurisdiction-eu`) — that publishes an editorial standard. The governance body accepts or rejects charter applications against a published admissibility criterion (section 7).

A claim can carry verdicts from multiple domains simultaneously. A reader or AI system may query verdicts scoped to one, several, or all chartered domains; the protocol composes per-domain verdicts into a per-query response.

### 5.5 Cascading falsification

Cascading falsification applies classical truth-maintenance techniques from artificial intelligence research, specifically Doyle's justification-based truth maintenance system (JTMS) [23] and de Kleer's assumption-based truth maintenance system (ATMS) [24]. A claim's declared `sources[]` and validator-inferred dependency edges form a directed acyclic graph. A retraction event propagates along this graph, marking dependent claims as requiring re-evaluation within each affected consensus domain.

Propagation requires quorum. A single-validator retraction marks dependents as `CASCADE_PENDING`, not as falsified; confirmation requires *K* independent validators with minimum reputation in the relevant domain. This is analogous to the appellate structure of human fact-checking but formalised in the protocol.

### 5.6 Transport and propagation

Attestation and cascade events propagate through a libp2p-based gossip network [25], with topic subscriptions scoped to domains of interest. Clients subscribe to the domains they consume; propagation within the subscribed topic is logarithmic in network size. For AI-grounding use, the protocol provides a secondary *materialised view* distribution: per-domain snapshots of claim verdicts, updated daily, served from edge content-delivery networks for low-latency grounding calls at AI-inference time.

### 5.7 AI-read surface

AI systems and downstream tools call a REST API conforming approximately to:

```
GET /v1/claim/{hash}?domain={domain}
GET /v1/check?claims={hashes}&domain={domain}&minValidators={N}
```

The API returns signed responses citing the underlying attestation hashes. Implementations are encouraged to distribute via CDN edges for latency below 50 ms from most regions. A typed client SDK in TypeScript, Python, and Go accompanies the reference implementation.

---

## 6 Mechanism choice: federation vs. blockchain

### 6.1 The comparison

The protocol's properties (section 5.1) can be provided by several distinct architectural mechanisms. Four are discussed here, on shared evaluation axes.

**Option A — Minimalist self-declared protocol.** Sites publish `/factcheck.json` files extending ClaimReview, version-controlled in git. No central infrastructure; no validator network. Covers provenance partially; does not provide plural verdicts, cascading falsification, or a federated AI-read surface.

**Option B — Federated signed claims.** The architecture described in section 5. Validators are institutionally credentialed via W3C VC/DID; attestations are logged in IETF SCITT transparency logs; gossip transport uses libp2p; content addressing follows CID (Content IDentifier) conventions. No blockchain, no token.

**Option C — Blockchain-incentivised.** Validator reputation and attestations are recorded on a blockchain; validators post token bonds; tokenomic incentives reward verification and penalise provable misbehaviour.

**Option D — Staged evolution (A → B; C as reserve).** Ship A immediately; develop B over 12–18 months; escalate to C only if specific failure conditions of B arise and C is shown to address them.

### 6.2 Evaluation

The following comparison summarises the detailed analysis presented in the supplementary dossier:

| Axis | A · Minimalist | B · Federated | C · Blockchain | D · Staged |
|---|---|---|---|---|
| Delivers four-property composition | partial | full | full | full (via B) |
| Infrastructure cost (year 1, 100 validators) | ≈ 0 | US$ 200–500k | US$ 2–5m plus capital | US$ 200–500k |
| Time to first demonstrable signal | weeks | 12–18 months | 24–36 months | weeks (A), 12–18 months (B) |
| Regulatory exposure | low | medium | high (securities-law and data-protection) | medium |
| Shipped prior art | schema.org, HTTPS-analog pattern | Certificate Transparency, Sigstore, C2PA | None in eight years of attempts | Inherits A and B track record |
| Ecosystem alignment | near-zero | SCITT, W3C, schema.org | blockchain-specific | low until C |

### 6.3 Justification

The empirical record of Option C is unfavourable. Successive token-incentivised fact-check initiatives between 2017 and 2024 — Civil, Po.et, Factmata, Bitpress, and the unreleased token economies of Fact Protocol and Swarm Network — have either ceased operation, pivoted away from tokenomics, or failed to ship. The aggregate market capitalisation of the surviving projects is small, and adoption among fact-check organisations is negligible.

The architectural justifications commonly offered for blockchain in this setting — tamper-evident logging, global event ordering, Sybil resistance — are all available at lower cost and lower regulatory exposure in federated systems. Certificate Transparency logs [2], extended with cosigned witnesses [11], provide tamper-evidence with a security model comparable to a well-operated blockchain at approximately three orders of magnitude lower operational cost. Sybil resistance in fact-checking is more effectively provided by institutional credentialing than by token bonding, because the cost of establishing a *convincingly institutional* validator (a university, a library, a newsroom) is high and not easily reduced by staking capital.

The General Data Protection Regulation (EU) 2016/679, Article 17 [26], constrains the permissibility of personal data on immutable ledgers. The French data-protection authority CNIL has published guidance [27] on blockchain compliance via crypto-shredding of off-chain encrypted data with on-chain hashes; this pattern is achievable but adds complexity. Federation avoids the issue substantially.

For these reasons, this paper recommends Option D — ship A immediately, build B over 12–18 months, hold C in reserve for narrowly-specified scenarios that have not yet arisen.

---

## 7 Governance

### 7.1 Principle

Governance is the central risk and the central intellectual contribution of Veritas Protocol. The protocol does not adjudicate which consensus domains are correct. It does determine which consensus domains are *admitted*. This is unavoidably a substantive editorial decision.

### 7.2 Structure

The proposed governance body is a non-profit foundation, hosted under an existing multi-stakeholder parent (for example the Mozilla Foundation, the Linux Foundation, or the Joint Development Foundation). Board composition is capped by sector: no more than 30 per cent of seats from any single sector drawn from {academia, civil society, journalism, industry, AI laboratories, governments}. Board terms are three years, renewable once. Geographic distribution requires representation from at least three global regions.

### 7.3 Admissibility criterion

A proposed consensus domain is admitted if and only if it meets a published set of editorial-standard criteria:

- A named rapporteur body (an institution with verifiable identity).
- A published editorial standard, with decision procedures.
- A commitment to respond to disputes in writing within a specified window.
- Fallibilism: an explicit procedure for updating the domain's verdicts on new evidence.
- Non-inclusion of claims on a published hard list of non-admissible positions.

The hard list is a substantive editorial commitment of the protocol. It includes positions where credible adjudication has been conducted by specialised institutions and where the adversarial intent of admitting "a different consensus" is sufficiently documented that the governance body refuses. Examples under discussion include: denial of the Holocaust; denial of the germ theory of disease; denial of election integrity findings where those findings have been adjudicated by competent courts. The hard list is public, auditable, and revisable only by board supermajority with published rationale. The foundation of this commitment is not neutrality but accountability: the protocol refuses to be a substrate for positions that meet the stated criteria, and publishes its reasoning when it does so.

### 7.4 Pluralism without relativism

The distinction between epistemic pluralism and metaphysical relativism is load-bearing. The protocol is compatible with the observation that *scientific consensus*, *legal jurisprudence*, and *historical scholarship* operate under different evidential standards, and that an assertion may be settled under one and unsettled under another. It is not compatible with the claim that these differences imply the absence of mind-independent truth, or that admission of a consensus domain constitutes endorsement of its methodological commitments. Admission is a commitment to openness of dispute, not to equivalence of findings.

### 7.5 Appeals

Rejected charter applications have a published appeals path to an independent panel. Panel decisions, like board decisions, are published with rationale. This exposes the governance body to public scrutiny of its editorial choices, which is the intended property.

---

## 8 Economics and sustainability

### 8.1 Cost

Phase I (0–6 months) is estimated at approximately US$ 200–300 thousand for specification drafting, reference aggregator implementation, browser extension, and standards-body engagement. Phase II (6–18 months) is estimated at approximately US$ 400–600 thousand for foundation operating cost, validator stipends across 5–10 institutions, legal counsel, and one AI-laboratory integration. Aggregate cost through Phase II is approximately US$ 600–900 thousand.

### 8.2 Revenue and funding

The protocol is designed for philanthropic and institutional funding, with a long-term mix of:

- Foundation grants (Mozilla Foundation, Knight Foundation, MacArthur Foundation, Ford Foundation, Protocol Labs / Filecoin Foundation for the Decentralized Web, EU Democracy Shield frameworks, Wellcome Trust where applicable).
- Tiered AI-laboratory service fees for high-volume grounding access.
- Institutional in-kind contributions (partial-FTE validator-institution commitments).
- Philanthropic major gifts.

No tokenomic revenue. No equity. No advertising.

### 8.3 Validator compensation

Institutional validators are compensated through stipends, not per-verification piece rates. Stipends are budgeted against the foundation's grant cycle. The compensation model explicitly does not attempt to provide market-rate compensation for all expert verification labour; such labour is and will remain subsidised by the validator institutions' core missions (universities, libraries, research organisations), as is common in peer review and institutional data curation.

### 8.4 Scale economics

The architecture is designed such that operating cost grows slowly with claim corpus size: transparency-log append cost is approximately O(log N); aggregator query cost is O(1) per query with caching; gossip propagation cost is logarithmic in peer count. Validator-hour cost is the dominant scaling term; this is addressed by expanding the validator network, not by any architectural change.

---

## 9 Legal and regulatory landscape

### 9.1 Defamation exposure for validators

A validator that signs "this claim is not supported" is making a reviewable statement. In the United States, *Stossel v. Meta Platforms* (N.D. Cal. 2022) [28] established that third-party fact-check labels, when framed as opinion rather than assertion of fact, receive substantial First Amendment protection; however, protection is not automatic and the framing is load-bearing. In the European Union, the Digital Services Act (Regulation (EU) 2022/2065) [29] introduces the Trusted Flagger designation (Article 22), which provides a regulatory pathway with defined obligations. In the United Kingdom, the Online Safety Act 2023 [30] imposes duties on large services and may create indirect exposure for validator statements that feed moderation decisions.

The protocol's design response is that validator attestations are expressed in opinion-and-procedural form: "Under the editorial standard of domain D and evidence E, validator V concludes Z." Institutional indemnification by the foundation, D&O insurance, and published retraction procedures further bound exposure.

### 9.2 Data protection

GDPR Article 17 (right to erasure) is incompatible with writing personal data to immutable ledgers [26]. The protocol's federation-based architecture stores only content hashes and signatures in the transparency log, with evidence artefacts held in content-addressed storage with erasure support (crypto-shredding patterns following CNIL guidance [27]). Personal data is never written to the transparency log directly.

### 9.3 Certification-mark obligations

A public "VERIFIED" badge, if operated as a certification mark under the Lanham Act § 1054 (US) [31] or the equivalent provisions in the EU Trade Mark Regulation [32], incurs specific duties of objectivity and non-discrimination. The protocol's default stance is that badges express *structural indicators* (a site publishes `/factcheck.json`; a claim has ≥*K* independent attestations in domain *D*) rather than content certifications. Any programme that issues a formal certification mark is subject to published standards and the associated legal obligations.

### 9.4 Regulatory alignment

Opportunities for regulatory alignment include: the EU AI Act (Regulation (EU) 2024/1689) [33] Article 50 transparency obligations for AI-generated content; the EU Digital Services Act Trusted Flagger programme; the forthcoming EU Democracy Shield framework; the NIST AI Risk Management Framework [34]; and the Partnership on AI's Responsible Practices for Synthetic Media [35]. The protocol is designed to be compatible with, rather than to replace, the obligations and programmes in each.

---

## 10 Risks and mitigations

The principal risks are governance-oriented rather than technical. A summary follows; the supplementary dossier contains a longer enumeration.

**R1 — Sham consensus domains.** Adversarial actors apply for charters under plausible names to legitimise positions failing the admissibility criterion. *Mitigation:* transparent charter process; published rejected-charter dashboard; quorum requirements for sock-puppet-resistant validator coalitions; hard-list enforcement.

**R2 — Governance body capture.** The foundation becomes dominated by a single sector or jurisdiction. *Mitigation:* sector caps on board composition; term limits; geographic distribution requirement; external governance audit; public decision records.

**R3 — State-narrative domains.** State actors operate consensus domains as propaganda instruments. *Mitigation:* admissibility criterion applies uniformly; refusal of charters that fail the criterion, with published rationale; acceptance of charters that meet the criterion, with transparency about state operation.

**R4 — Validator defamation exposure.** Validators face legal liability for signed attestations. *Mitigation:* opinion-form framing; foundation indemnification; DSA Trusted Flagger alignment where applicable; insurance.

**R5 — AI-laboratory non-integration.** The AI-read surface is not adopted by major laboratories, undermining the hallucination-reduction claim. *Mitigation:* pilot with one laboratory under research grant; publish measurable hallucination-reduction benchmark; open-weight model ecosystems as parallel adopters.

**R6 — Validator economic unsustainability.** Stipend funding cannot scale to cover the expert labour cost of verifying a growing claim corpus. *Mitigation:* diversified funding mix; service-fee tier at Phase II; institutional in-kind commitments; focused scope (the protocol is not a substitute for all fact-checking labour, only a structured substrate for it).

**R7 — Cascade attack.** An adversary forges or procures a retraction of a heavily-cited upstream claim to trigger false cascades across dependents. *Mitigation:* quorum-required retraction events; 24-hour `DISPUTED-PENDING` window; recusal rule (validators cannot retract claims they signed).

**R8 — Reader-default collapse.** The default domain setting effectively collapses the protocol to single-verdict fact-checking, defeating the plurality contribution. *Mitigation:* composed default domain combining several frames; user interface surfacing disagreement where present; published research on default-choice behaviour.

---

## 11 Comparison with existing systems

### 11.1 Relationship to composite prior art

The Veritas Protocol's four properties can be mapped onto the prior-art layers of section 2:

| Capability | Provided by prior art | Remaining gap |
|---|---|---|
| Cryptographic claim identity | Content Authenticity Initiative / C2PA signs media assets | No granular per-claim-text signature standard |
| Per-claim provenance data model | W3C PROV, schema.org/ClaimReview | Not integrated across the two |
| Validator identity standard | W3C VC / DID, x.509, COSE | No domain-scoped validator reputation standard |
| Plural / domain-indexed verdicts | *not provided* | **Central contribution of this proposal** |
| Cross-document cascading falsification | *not provided* (Retraction Watch feeds an outer boundary only) | **Central contribution** |
| Open, canonical AI-read API | *not provided* (existing RAG grounding is proprietary) | **Central contribution** |

The genuinely novel contribution is the composition of domain-indexed verdicts, cross-document retraction propagation, and an open AI-read surface atop mature identity and transparency primitives. The remainder of the design is careful composition.

### 11.2 Platform-native systems

X Community Notes [15] and Wikipedia/Wikidata demonstrate that scaled consensus signal is achievable in platform-specific contexts. Veritas Protocol is complementary, not substitutional: a platform-native signal like Community Notes could be expressed as one consensus domain among several; Wikidata statements could be consumed as claim records by Veritas aggregators, subject to validator attestation.

### 11.3 Token-incentivised predecessors

The empirical record of token-incentivised fact-checking is as follows (supplementary dossier contains full postmortems and citations):

- *Civil Media:* refunded investors in 2019 after token-launch failure; team absorbed into identity-oriented work.
- *Po.et:* shipped proof-of-publishing, did not sustain; token utility fragmented.
- *Factmata (rebranded Cortico):* acquired; ceased independent operation.
- *Bitpress:* dormant after approximately 2018.
- *Fact Protocol:* specification phase; no public token launch; limited adoption.
- *Swarm Network / Truth Protocol:* reported a US$ 13 million raise [source unverified at publication]; limited public adoption at the time of writing.
- *DEFC, ProBlock, Torky Proof-of-Credibility:* academic prototypes, instructive on pitfalls, not deployed at scale.

No successful token-incentivised fact-checking product has sustained operations at the scale and quality of schema.org/ClaimReview, the Community Notes bridging algorithm, or Wikipedia/Wikidata. This empirical pattern is load-bearing for the mechanism choice in section 6.

---

## 12 Open research questions

The protocol raises substantial research questions in areas where definitive answers do not yet exist. Partner engagement on these questions is explicitly invited.

**Q1 — Formal semantics of domain-relative truth.** Can "true in domain D at time t" be given rigorous formal semantics (for example, in the style of Kripke frames with consensus-domain indices) that avoids collapse into either monistic realism or unbounded relativism?

**Q2 — Validator reputation mathematics.** How should per-domain reputation be weighted, decayed, and slashed? EigenTrust-family algorithms [36] provide a principled foundation; the adaptation to domain-scoped fact-verification has not been standardised.

**Q3 — Domain-gerrymandering defence.** What automated or procedural signals reliably distinguish a legitimate minority consensus domain from a sham-charter?

**Q4 — Privacy-sensitive claims.** How should claims involving personal, medical, legal, or national-security-sensitive evidence be handled under the protocol without either excluding them or violating data-protection law?

**Q5 — Adversarial AI-generated provenance.** As generative systems produce syntactically-valid provenance graphs with fabricated sources, what verifier procedures remain robust?

**Q6 — Latency and scale of AI-grounding.** What is the achievable latency floor for grounding calls in production AI serving environments, and how does the answer interact with staleness tolerance?

**Q7 — Governance of the hard list.** By what decision procedure should the list of non-admissible positions be revised? How are edge cases adjudicated?

**Q8 — Cross-jurisdictional federation.** How should validator obligations and protections vary by jurisdiction while preserving the universal value of the protocol's core signals?

---

## 13 Implementation roadmap

### 13.1 Phase I — 0 to 6 months

- Specification draft v0.2 of the `/factcheck` protocol extension.
- Reference aggregator implementation (open source).
- Reference browser extension.
- Seed publishing on partner sites.
- IETF SCITT use-case submission.
- W3C Credentials Community Group input document.
- schema.org extension proposal.

**Gate:** at minimum 20 publishing sites, 5 third-party-attesting organisations, one AI-laboratory conversation.

### 13.2 Phase II — 6 to 18 months

- Specification v0.1 of the federated signed-claims protocol (extends Phase I outputs).
- Foundation host secured.
- Validator cohort of 5–10 institutions across at least 3 jurisdictions.
- First chartered consensus domains: `scientific-default`, `journalism-default`, `legal-jurisdiction-eu`, and two historical / subject-matter domains.
- Reference AI-laboratory integration with published benchmark on hallucination reduction.
- First measured cascading-falsification event.

**Gate:** quantitative benefit demonstrated; federated network across multiple jurisdictions; at least one peer-reviewed publication.

### 13.3 Phase III — 18 months and beyond

- Sustained operation under the foundation.
- Expansion of consensus-domain registry.
- Interoperation with existing fact-checking bodies (IFCN, Duke Reporters' Lab, regional networks).
- Public-interest operating model at steady state.

**Gate condition for escalation to a crypto-incentivised architecture:** all three of (i) a specific, evidenced failure of the federated architecture that crypto-incentives would address; (ii) independent demonstration that a crypto-incentivised alternative does address that specific failure; (iii) a credible crypto-native consortium co-sponsoring the work. Absent all three, do not escalate.

---

## 14 Call for research partners

The Collaborative Fact-Checking Working Group invites research partners with alignment to the goals of this paper to participate in Phase I and Phase II. Four concrete forms of participation are offered:

- **Co-author** the formal specification in the relevant standards bodies (W3C Credentials CG, IETF SCITT, schema.org).
- **Operate** a validator in the testnet during Phase II.
- **Audit** the protocol in the open — cryptographic, economic, governance, and legal audit contributions.
- **Charter** a consensus domain and publish its editorial standard.

Audiences of particular interest include: university libraries and consortia (OCLC members, national library systems); fact-checking networks (IFCN, Duke Reporters' Lab, Lead Stories, Logically, Full Fact, Africa Check, Chequeado); scientific-publishing and open-science infrastructure (Crossref, OpenAlex, Semantic Scholar, Europe PMC, FORCE11); AI research organisations (both industry-leading laboratories and open-weight-model communities); and public-interest foundations with programmes in journalism, democracy, and open infrastructure.

Contact, draft specifications, and further working-paper material are available on request through the project's public repositories.

---

## References

[1] B. Laurie, A. Langley, E. Kasper. *Certificate Transparency.* RFC 6962, IETF, June 2013. https://www.rfc-editor.org/rfc/rfc6962

[2] B. Laurie, E. Messeri, R. Stradling. *Certificate Transparency Version 2.0.* RFC 9162, IETF, December 2021. https://www.rfc-editor.org/rfc/rfc9162

[3] Coalition for Content Provenance and Authenticity. *C2PA Specifications.* https://c2pa.org/specifications/

[4] Content Authenticity Initiative. Membership and implementation roster. https://contentauthenticity.org/

[5] schema.org. *ClaimReview type definition.* https://schema.org/ClaimReview

[6] International Fact-Checking Network. *Code of Principles.* Poynter Institute. https://ifcncodeofprinciples.poynter.org/

[7] Duke University Reporters' Lab. *Global fact-check registry.* https://reporterslab.org/fact-checking/

[8] M. Sporny, D. Longley, D. Chadwick, eds. *Verifiable Credentials Data Model 2.0.* W3C Recommendation, 2024. https://www.w3.org/TR/vc-data-model-2.0/

[9] D. Longley, M. Sporny, eds. *Decentralized Identifiers (DIDs) v1.1.* W3C Recommendation, 2024. https://www.w3.org/TR/did-1.1/

[10] IETF SCITT (Supply Chain Integrity, Transparency and Trust) Working Group. *Working group charter and drafts.* https://datatracker.ietf.org/wg/scitt/

[11] Sigstore Project. *Technical documentation: Fulcio, Rekor, Cosign.* https://docs.sigstore.dev/

[12] Academic literature on decentralised fact-check consortia. Multiple arXiv and conference publications.

[13] A. Sengupta et al. *ProBlock: A Novel Approach for Fake News Detection.* 2021. [Verify exact citation.]

[14] H. Torky et al. *Proof of Credibility — An Early Source-Credibility Consensus Formula.* 2019. [Verify exact citation.]

[15] Y. Wojcik, et al. *Effectiveness of community-based fact-checking on X Community Notes.* *Proceedings of the National Academy of Sciences*, 2024. https://www.pnas.org/doi/10.1073/pnas.2503413122

[16] Retraction Watch. *Database of retracted scientific publications.* https://retractionwatch.com/

[17] Crossref. *Retraction metadata feed.* https://www.crossref.org/

[18] Vectara. *Hallucination Leaderboard.* https://github.com/vectara/hallucination-leaderboard

[19] HalluLens benchmark, ACL 2025. [Verify full citation.]

[20] D. Longley, M. Sporny. *RDF Dataset Canonicalization (URDNA2015).* W3C Recommendation, 2024. https://www.w3.org/TR/rdf-canon/

[21] M. Jones, J. Bradley, N. Sakimura. *JSON Web Signature.* RFC 7515, IETF, May 2015. https://www.rfc-editor.org/rfc/rfc7515

[22] M. Sporny, O. Steele, eds. *Verifiable Credentials Status List.* W3C. https://www.w3.org/TR/vc-bitstring-status-list/

[23] J. Doyle. *A Truth Maintenance System.* *Artificial Intelligence* 12 (3): 231–272, 1979.

[24] J. de Kleer. *An Assumption-based TMS.* *Artificial Intelligence* 28 (2): 127–162, 1986.

[25] libp2p specifications. *GossipSub v1.1.* Protocol Labs. https://github.com/libp2p/specs

[26] European Parliament and Council. *Regulation (EU) 2016/679 (General Data Protection Regulation).* Article 17. https://eur-lex.europa.eu/eli/reg/2016/679/oj

[27] CNIL. *Blockchain: Premiers éléments d'analyse de la CNIL.* 2018. https://www.cnil.fr/fr/blockchain

[28] *Stossel v. Meta Platforms, Inc.* N.D. Cal., 2022. Opinion available via PACER.

[29] European Parliament and Council. *Regulation (EU) 2022/2065 (Digital Services Act).* https://eur-lex.europa.eu/eli/reg/2022/2065/oj

[30] Parliament of the United Kingdom. *Online Safety Act 2023.* https://www.legislation.gov.uk/ukpga/2023/50/contents

[31] 15 U.S.C. § 1054 (certification marks). *Lanham Act.* https://www.law.cornell.edu/uscode/text/15/1054

[32] European Parliament and Council. *Regulation (EU) 2017/1001 on the European Union trade mark.* https://eur-lex.europa.eu/eli/reg/2017/1001/oj

[33] European Parliament and Council. *Regulation (EU) 2024/1689 (Artificial Intelligence Act).* https://eur-lex.europa.eu/eli/reg/2024/1689/oj

[34] NIST. *AI Risk Management Framework 1.0.* 2023. https://www.nist.gov/itl/ai-risk-management-framework

[35] Partnership on AI. *Responsible Practices for Synthetic Media.* https://syntheticmedia.partnershiponai.org/

[36] S. D. Kamvar, M. T. Schlosser, H. Garcia-Molina. *The EigenTrust algorithm for reputation management in P2P networks.* Proceedings of WWW 2003.

---

## Appendix A — Summary of the four-property data model

A conformant Veritas claim record, in reduced form:

```json
{
  "@context": "https://schema.org",
  "@type": "Claim",
  "claimText": "…",
  "claimHash": "sha256:…",
  "claimGroupId": "urn:…",
  "language": "en",
  "type": "FACT",
  "sources": [ { "claimHash": "sha256:…", "role": "supports" } ],
  "evidence": [ { "uri": "https://…", "contentHash": "sha256:…" } ],
  "method": "documentary-review",
  "dateDeclared": "2026-04-21T00:00:00Z"
}
```

A conformant Veritas attestation record:

```json
{
  "@context": "https://veritas-protocol.example/v0.1",
  "@type": "Attestation",
  "claimHash": "sha256:…",
  "issuer": "did:web:university.example",
  "consensusDomain": "scientific-default",
  "verdict": "SUPPORTED",
  "rationale": "…",
  "dateIssued": "2026-04-21T12:00:00Z",
  "proof": { "type": "DataIntegrityProof", "…": "…" },
  "scittReceipt": "…"
}
```

Canonical JSON-LD form per URDNA2015 [20]; transparency-log receipt per IETF SCITT [10].

---

## Appendix B — Summary of phased deliverables

| Phase | Months | Primary deliverables | Estimated cost |
|---|---|---|---|
| I | 0–6 | /factcheck v0.2 spec; reference aggregator; browser extension; standards submissions | US$ 200–300k |
| II | 6–18 | Veritas spec v0.1; foundation host; 5–10 validators; first AI-lab integration; first cascading event | US$ 400–600k |
| III | 18+ | Sustained operation; domain expansion; institutional interop | Steady-state; funding mix |

---

*This paper is a working document. Comment, correction, and partnership inquiries are welcome.*