
clarity-gate
by frmoretto
Stop LLMs from hallucinating your guesses as facts. Clarity Gate is a verification protocol for your documents that are going to be provided to LLMs or RAG systems. Place automatically the missing uncertainty markers to avoid confident hallucinations. HITL for non-directly verifiable claims.
SKILL.md
agentskills.io compliant frontmatter
name: clarity-gate version: 2.0.0 description: > Pre-ingestion verification for epistemic quality in RAG systems. Ensures documents are properly qualified before entering knowledge bases. Produces CGD (Clarity-Gated Documents) and validates SOT (Source of Truth) files. author: Francesco Marinoni Moretto license: CC-BY-4.0 repository: https://github.com/frmoretto/clarity-gate triggers:
- clarity gate
- check for hallucination risks
- can an LLM read this safely
- review for equivocation
- verify document clarity
- pre-ingestion check
- cgd verify
- sot verify capabilities:
- document-verification
- epistemic-quality
- rag-preparation
- cgd-generation
- sot-validation outputs:
- type: cgd extension: .cgd.md spec: docs/CLARITY_GATE_FORMAT_SPEC.md spec_version: "2.0"
Clarity Gate v2.0
Purpose: Pre-ingestion verification system that enforces epistemic quality before documents enter RAG knowledge bases. Produces Clarity-Gated Documents (CGD) compliant with the Clarity Gate Format Specification v2.0.
Core Question: "If another LLM reads this document, will it mistake assumptions for facts?"
Core Principle: "Detection finds what is; enforcement ensures what should be. In practice: find the missing uncertainty markers before they become confident hallucinations."
What's New in v2.0
| Feature | Description |
|---|---|
| CGD Format Compliance | Outputs compliant CGD files with YAML frontmatter and end markers |
| SOT Validation | Validates Source of Truth files against SOT Format Spec |
| Spec References | Links to formal specifications for validation rules |
| agentskills.io Compliant | Standard frontmatter for skill discovery |
| Validation Rules Section | Maps 9 points to formal rule codes |
| Quine Protection | End marker detection ignores markers inside code fences (§2.3) |
| Redacted Export | Export documents with exclusion content replaced by [REDACTED] (§8.11) |
Specifications
This skill implements and references:
| Specification | Version | Location |
|---|---|---|
| Clarity Gate Format (Unified) | v2.0 | docs/CLARITY_GATE_FORMAT_SPEC.md |
Note: v2.0 unifies CGD and SOT into a single .cgd.md format. SOT is now a CGD with an optional tier: block.
The Key Distinction
Existing tools like UnScientify and HedgeHunter (CoNLL-2010) detect uncertainty markers already present in text ("Is uncertainty expressed?").
Clarity Gate enforces their presence where epistemically required ("Should uncertainty be expressed but isn't?").
| Tool Type | Question | Example |
|---|---|---|
| Detection | "Does this text contain hedges?" | UnScientify/HedgeHunter find "may", "possibly" |
| Enforcement | "Should this claim be hedged but isn't?" | Clarity Gate flags "Revenue will be $50M" |
Critical Limitation
Clarity Gate verifies FORM, not TRUTH.
This skill checks whether claims are properly marked as uncertain—it cannot verify if claims are actually true.
Risk: An LLM can hallucinate facts INTO a document, then "pass" Clarity Gate by adding source markers to false claims.
Solution: HITL (Human-In-The-Loop) verification is MANDATORY before declaring PASS.
When to Use
- Before ingesting documents into RAG systems
- Before sharing documents with other AI systems
- After writing specifications, state docs, or methodology descriptions
- When a document contains projections, estimates, or hypotheses
- Before publishing claims that haven't been validated
- When handing off documentation between LLM sessions
The 9 Verification Points
Relationship to Spec Suite
The 9 Verification Points guide semantic review — content quality checks that require judgment (human or AI). They answer questions like "Should this claim be hedged?" and "Are these numbers consistent?"
When review completes, output a CGD file conforming to CLARITY_GATE_FORMAT_SPEC.md. The C/S rules in CLARITY_GATE_FORMAT_SPEC.md validate file structure, not semantic content.
The connection:
- Semantic findings (9 points) determine what issues exist
- Issues are recorded in CGD state fields (
clarity-status,hitl-status,hitl-pending-count) - State consistency is enforced by structural rules (C7-C10)
Example: If Point 5 (Data Consistency) finds conflicting numbers, you'd mark clarity-status: UNCLEAR until resolved. Rule C7 then ensures you can't claim REVIEWED while still UNCLEAR.
Epistemic Checks (Core Focus: Points 1-4)
1. HYPOTHESIS vs FACT LABELING Every claim must be clearly marked as validated or hypothetical.
| Fails | Passes |
|---|---|
| "Our architecture outperforms competitors" | "Our architecture outperforms competitors [benchmark data in Table 3]" |
| "The model achieves 40% improvement" | "The model achieves 40% improvement [measured on dataset X]" |
Fix: Add markers: "PROJECTED:", "HYPOTHESIS:", "UNTESTED:", "(estimated)", "~", "?"
2. UNCERTAINTY MARKER ENFORCEMENT Forward-looking statements require qualifiers.
| Fails | Passes |
|---|---|
| "Revenue will be $50M by Q4" | "Revenue is projected to be $50M by Q4" |
| "The feature will reduce churn" | "The feature is expected to reduce churn" |
Fix: Add "projected", "estimated", "expected", "designed to", "intended to"
3. ASSUMPTION VISIBILITY Implicit assumptions that affect interpretation must be explicit.
| Fails | Passes |
|---|---|
| "The system scales linearly" | "The system scales linearly [assuming <1000 concurrent users]" |
| "Response time is 50ms" | "Response time is 50ms [under standard load conditions]" |
Fix: Add bracketed conditions: "[assuming X]", "[under conditions Y]", "[when Z]"
4. AUTHORITATIVE-LOOKING UNVALIDATED DATA Tables with specific percentages and checkmarks look like measured data.
Red flag: Tables with specific numbers (89%, 95%, 100%) without sources
Fix: Add "(guess)", "(est.)", "?" to numbers. Add explicit warning: "PROJECTED VALUES - NOT MEASURED"
Data Quality Checks (Complementary: Points 5-7)
5. DATA CONSISTENCY Scan for conflicting numbers, dates, or facts within the document.
Red flag: "500 users" in one section, "750 users" in another
Fix: Reconcile conflicts or explicitly note the discrepancy with explanation.
6. IMPLICIT CAUSATION Claims that imply causation without evidence.
Red flag: "Shorter prompts improve response quality" (plausible but unproven)
Fix: Reframe as hypothesis: "Shorter prompts MAY improve response quality (hypothesis, not validated)"
7. FUTURE STATE AS PRESENT Describing planned/hoped outcomes as if already achieved.
Red flag: "The system processes 10,000 requests per second" (when it hasn't been built)
Fix: Use future/conditional: "The system is DESIGNED TO process..." or "TARGET: 10,000 rps"
Verification Routing (Points 8-9)
8. TEMPORAL COHERENCE Document dates and timestamps must be internally consistent and plausible.
| Fails | Passes |
|---|---|
| "Last Updated: December 2024" (when current is 2026) | "Last Updated: January 2026" |
| v1.0.0 dated 2024-12-23, v1.1.0 dated 2024-12-20 | Versions in chronological order |
Sub-checks:
- Document date vs current date
- Internal chronology (versions, events in order)
- Reference freshness ("current", "now", "today" claims)
Fix: Update dates, add "as of [date]" qualifiers, flag stale claims
9. EXTERNALLY VERIFIABLE CLAIMS Specific numbers that could be fact-checked should be flagged for verification.
| Type | Example | Risk |
|---|---|---|
| Pricing | "Costs ~$0.005 per call" | API pricing changes |
| Statistics | "Papers average 15-30 equations" | May be wildly off |
| Rates/ratios | "40% of researchers use X" | Needs citation |
| Competitor claims | "No competitor offers Y" | May be outdated |
Fix options:
- Add source with date
- Add uncertainty marker
- Route to HITL or external search
- Generalize ("low cost" instead of "$0.005")
The Verification Hierarchy
Claim Extracted --> Does Source of Truth Exist?
|
+---------------+---------------+
YES NO
| |
Tier 1: Automated Tier 2: HITL
Consistency & Verification Two-Round Verification
| |
PASS / BLOCK Round A → Round B → APPROVE / REJECT
Tier 1: Automated Verification
A. Internal Consistency
- Figure vs. Text contradictions
- Abstract vs. Body mismatches
- Table vs. Prose conflicts
- Numerical consistency
B. External Verification (Extension Interface)
- User-provided connectors to structured sources
- Financial systems, Git commits, CRM, etc.
Tier 2: Two-Round HITL Verification — MANDATORY
Round A: Derived Data Confirmation
- Claims from sources found in session
- Human confirms interpretation, not truth
Round B: True HITL Verification
- Claims needing actual verification
- No source found, human's own data, extrapolations
CGD Output Format
When producing a Clarity-Gated Document, use this format per CLARITY_GATE_FORMAT_SPEC.md v2.0:
---
clarity-gate-version: 2.0
processed-date: 2026-01-12
processed-by: Claude + Human Review
clarity-status: CLEAR
hitl-status: REVIEWED
hitl-pending-count: 0
points-passed: 1-9
rag-ingestable: true # computed by validator - do not set manually
document-sha256: 7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730
hitl-claims:
- id: claim-75fb137a
text: "Revenue projection is $50M"
value: "$50M"
source: "Q3 planning doc"
location: "revenue-projections/1"
round: B
confirmed-by: Francesco
confirmed-date: 2026-01-12
---
# Document Title
[Document body with epistemic markers applied]
Claims like "Revenue will be $50M" become "Revenue is **projected** to be $50M *(unverified projection)*"
---
## HITL Verification Record
### Round A: Derived Data Confirmation
- Claim 1 (source) ✓
- Claim 2 (source) ✓
### Round B: True HITL Verification
| # | Claim | Status | Verified By | Date |
|---|-------|--------|-------------|------|
| 1 | [claim] | ✓ Confirmed | [name] | [date] |
<!-- CLARITY_GATE_END -->
Clarity Gate: CLEAR | REVIEWED
Required CGD Elements (per spec):
- YAML frontmatter with all required fields:
clarity-gate-version— Tool version (no "v" prefix)processed-date— YYYY-MM-DD formatprocessed-by— Processor nameclarity-status— CLEAR or UNCLEARhitl-status— PENDING, REVIEWED, or REVIEWED_WITH_EXCEPTIONShitl-pending-count— Integer ≥ 0points-passed— e.g.,1-9or1-4,7,9hitl-claims— List of verified claims (may be empty[])
- End marker (HTML comment + status line):
<!-- CLARITY_GATE_END --> Clarity Gate: <clarity-status> | <hitl-status> - HITL verification record (if status is REVIEWED)
Optional/Computed Fields:
rag-ingestable— Computed by validators, not manually set. Showstrueonly whenCLEAR | REVIEWEDwith no exclusion blocks.document-sha256— Required. 64-char lowercase hex hash for integrity verification. See spec §2 for computation rules.exclusions-coverage— Optional. Fraction of body inside exclusion blocks (0.0–1.0).
Escape Mechanism: To write about markers like *(estimated)* without triggering parsing, wrap in backticks: `*(estimated)*`
Exclusion Blocks
When content cannot be resolved (no SME available, legacy prose, etc.), mark it as excluded rather than leaving it ambiguous:
<!-- CG-EXCLUSION:BEGIN id=auth-legacy-1 -->
Legacy authentication details that require SME review...
<!-- CG-EXCLUSION:END id=auth-legacy-1 -->
Rules:
- IDs must match:
[A-Za-z0-9][A-Za-z0-9._-]{0,63} - No nesting or overlapping blocks
- Each ID used only once
- Requires
hitl-status: REVIEWED_WITH_EXCEPTIONS - Must document
exceptions-reasonandexceptions-idsin frontmatter
Important: Documents with exclusion blocks are not RAG-ingestable. They're rejected entirely (no partial ingestion).
See CLARITY_GATE_FORMAT_SPEC.md §4 for complete rules.
SOT Validation
When validating a Source of Truth file, the skill checks both format compliance (per CLARITY_GATE_FORMAT_SPEC.md) and content quality (the 9 points).
Format Compliance (Structural Rules)
SOT documents are CGDs with a tier: block. They require a ## Verified Claims section with a valid table.
| Code | Check | Severity |
|---|---|---|
| E-TB01 | No ## Verified Claims section | ERROR |
| E-TB02 | Table has no data rows | ERROR |
| E-TB03 | Required columns missing (Claim, Value, Source, Verified) | ERROR |
| E-TB04 | Column order wrong (Claim not first or Verified not last) | ERROR |
| E-TB05 | Empty cell in required column | ERROR |
| E-TB06 | Invalid date format in Verified column | ERROR |
| E-TB07 | Verified date in future (beyond 24h grace) | ERROR |
Content Quality (9 Points)
The 9 Verification Points apply to SOT content:
| Point | SOT Application |
|---|---|
| 1-4 | Check claims in ## Verified Claims are actually verified |
| 5 | Check for conflicting values across tables |
| 6 | Check claims don't imply unsupported causation |
| 7 | Check table doesn't state futures as present |
| 8 | Check dates are chronologically consistent |
| 9 | Flag specific numbers for external check |
SOT-Specific Requirements
- Tier block required: SOT is a CGD with
tier:block containinglevel,owner,version,promoted-date,promoted-by - Structured claims table:
## Verified Claimssection with columns: Claim, Value, Source, Verified - Table outside exclusions: The verified claims table must NOT be inside an exclusion block
- Staleness markers: Use
[STABLE],[CHECK],[VOLATILE],[SNAPSHOT]in content[STABLE]— Safe to cite without rechecking[CHECK]— Verify before citing[VOLATILE]— Changes frequently; always verify[SNAPSHOT]— Point-in-time data; include date when citing
Output Format
After running Clarity Gate, report:
## Clarity Gate Results
**Document:** [filename]
**Issues Found:** [number]
### Critical (will cause hallucination)
- [issue + location + fix]
### Warning (could cause equivocation)
- [issue + location + fix]
### Temporal (date/time issues)
- [issue + location + fix]
### Externally Verifiable Claims
| # | Claim | Type | Suggested Verification |
|---|-------|------|------------------------|
| 1 | [claim] | Pricing | [where to verify] |
---
## Round A: Derived Data Confirmation
- [claim] ([source])
Reply "confirmed" or flag any I misread.
---
## Round B: HITL Verification Required
| # | Claim | Why HITL Needed | Human Confirms |
|---|-------|-----------------|----------------|
| 1 | [claim] | [reason] | [ ] True / [ ] False |
---
**Would you like me to produce an annotated CGD version?**
---
**Verdict:** PENDING CONFIRMATION
Severity Levels
| Level | Definition | Action |
|---|---|---|
| CRITICAL | LLM will likely treat hypothesis as fact | Must fix before use |
| WARNING | LLM might misinterpret | Should fix |
| TEMPORAL | Date/time inconsistency detected | Verify and update |
| VERIFIABLE | Specific claim that could be fact-checked | Route to HITL or external search |
| ROUND A | Derived from witnessed source | Quick confirmation |
| ROUND B | Requires true verification | Cannot pass without confirmation |
| PASS | Clearly marked, no ambiguity, verified | No action needed |
Quick Scan Checklist
| Pattern | Action |
|---|---|
| Specific percentages (89%, 73%) | Add source or mark as estimate |
| Comparison tables | Add "PROJECTED" header |
| "Achieves", "delivers", "provides" | Use "designed to", "intended to" if not validated |
| Checkmarks | Verify these are confirmed |
| "100%" anything | Almost always needs qualification |
| "Last Updated: [date]" | Check against current date |
| Version numbers with dates | Verify chronological order |
| "$X.XX" or "~$X" (pricing) | Flag for external verification |
| "averages", "typically" | Flag for source/citation |
| Competitor capability claims | Flag for external verification |
What This Skill Does NOT Do
- Does not classify document types (use Stream Coding for that)
- Does not restructure documents
- Does not add deep links or references
- Does not evaluate writing quality
- Does not check factual accuracy autonomously (requires HITL)
Related Projects
| Project | Purpose | URL |
|---|---|---|
| Source of Truth Creator | Create epistemically calibrated docs | github.com/frmoretto/source-of-truth-creator |
| Stream Coding | Documentation-first methodology | github.com/frmoretto/stream-coding |
| ArXiParse | Scientific paper verification | arxiparse.org |
Changelog
v2.0.0 (2026-01-13)
- ADDED: agentskills.io compliant YAML frontmatter
- ADDED: Clarity Gate Format Specification v2.0 compliance (unified CGD/SOT)
- ADDED: SOT validation support with E-TB* error codes
- ADDED: Validation rules mapping (9 points → rule codes)
- ADDED: CGD output format template with
<!-- CLARITY_GATE_END -->markers - ADDED: Quine Protection note (§2.3 fence-aware marker detection)
- ADDED: Redacted Export feature (§8.11)
- UPDATED:
hitl-claimsformat to v2.0 schema (id, text, value, source, location, round) - UPDATED: End marker format to HTML comment style
- UPDATED: Unified format spec v2.0 (single
.cgd.mdextension) - RESTRUCTURED: For multi-platform skill discovery
v1.6 (2025-12-31)
- Added Two-Round HITL verification system
- Round A: Derived Data Confirmation
- Round B: True HITL Verification
v1.5 (2025-12-28)
- Added Point 8: Temporal Coherence
- Added Point 9: Externally Verifiable Claims
v1.4 (2025-12-23)
- Added CGD annotation output mode
v1.3 (2025-12-21)
- Restructured points into Epistemic (1-4) and Data Quality (5-7)
v1.2 (2025-12-21)
- Added Source of Truth request step
v1.1 (2025-12-21)
- Added HITL Fact Verification (mandatory)
v1.0 (2025-11)
- Initial release with 6-point verification
Version: 2.0.0 Spec Version: 2.0 Author: Francesco Marinoni Moretto License: CC-BY-4.0
Score
Total Score
Based on repository quality metrics
SKILL.mdファイルが含まれている
ライセンスが設定されている
100文字以上の説明がある
GitHub Stars 100以上
1ヶ月以内に更新
10回以上フォークされている
オープンIssueが50未満
プログラミング言語が設定されている
1つ以上のタグが設定されている
Reviews
Reviews coming soon

