GOSM Verification Procedure

Input: $ARGUMENTS

Interpretations

Before executing, identify which interpretation matches the user’s input:

Interpretation 1 — Verify specific claims: The user has one or more explicit claims they want verified to the GOSM standard (Observed, Tested, or Derived). Interpretation 2 — Verify output from a prior skill: The user has output from /araw, /claim, or another skill and wants the claims in it verified against source evidence. Interpretation 3 — Verify before publishing: The user has content they plan to share and wants every factual claim verified before it goes out.

If ambiguous, ask: “Do you want me to verify specific claims, check the claims in previous output, or audit content before publishing?” If clear from context, proceed with the matching interpretation.

Core Principles

Three categories, no exceptions. Every claim is OBSERVED (seen in source), TESTED (confirmed by execution), or DERIVED (logically follows from verified premises). There is no fourth category. “Probably true” and “widely believed” are not verification statuses.
Unverified claims are excluded, not downgraded. The temptation is to say “Confidence: LOW” and include the claim anyway. This is not verification — it’s hedged guessing. If a claim can’t be verified, it is excluded from output. Period.
Derivation chains must be complete. A DERIVED claim is only as strong as its weakest premise. Every premise in a derivation must itself be verified (O, T, or D). A derivation from unverified premises is not a derivation — it’s speculation with extra steps.
Observation means source, not memory. “I know this is true” is not observation. OBSERVED requires identifying the specific source and pointing to the specific content. If you can’t point to it, you haven’t observed it.
Testing means execution, not thought experiment. “This would probably work” is not testing. TESTED requires actually running something and recording the result. The test must be reproducible.
Verification is binary per claim. Each claim is either verified or it isn’t. There’s no “partially verified.” If a claim is too broad to verify fully, decompose it into narrower claims that can be.

Verification Standard

1. OBSERVED [O: source]

Requirements for OBSERVED status:

Source is identified and accessible
Observation method is documented (how did you find this?)
No interpretation is added to what was observed
If quoting, the quote is verbatim
Source is authoritative for this type of claim

CLAIM: [the claim]
STATUS: VERIFIED_OBSERVED
EVIDENCE: [what was seen, where]
MARKER: [O: specific source with location]

2. TESTED [T: N=count, result]

Requirements for TESTED status:

Test conditions are documented
Test was actually executed (not hypothetical)
Result is documented exactly
Test is reproducible by someone else
Test actually measures what the claim asserts

CLAIM: [the claim]
STATUS: VERIFIED_TESTED
EVIDENCE: [test setup, execution, result]
MARKER: [T: N=X, result=Y]

3. DERIVED [D: premises -> conclusion]

Requirements for DERIVED status:

ALL premises are themselves verified (O, T, or D)
Inference is valid (modus ponens, modus tollens, etc.)
No hidden premises (every step is explicit)
Derivation chain is documented completely

CLAIM: [the claim]
STATUS: VERIFIED_DERIVED
EVIDENCE: [the derivation]
MARKER: [D: premise1 [O] + premise2 [T] -> conclusion]

Verification Procedure

Step 1: Extract Claims

From the input, extract every factual claim. Number them: V1, V2, V3…

For each claim, note:

The claim text
Where it appears in the input
Its importance (load-bearing / supporting / incidental)

Step 2: Attempt Verification (in order)

For each claim, try methods in this order:

1. Attempt OBSERVED:

Can this be found in a source?
Read the relevant file, document, or reference
Find the specific line/section
Quote exactly what you see
If found: mark [O: source]

2. Attempt TESTED:

Can this be executed or measured?
Design a test that would verify this
Run the test
Record the result exactly
If confirmed: mark [T: N=count, result]

3. Attempt DERIVED:

Can this be logically proven from verified premises?
List every premise
Verify each premise is [O], [T], or [D]
Show the complete derivation chain
If derivation is valid: mark [D: premises -> conclusion]

4. If NONE apply:

Mark as UNVERIFIED
EXCLUDE from verified output
Note why verification failed

Step 3: Compile Verification Report

GOSM VERIFICATION REPORT

VERIFIED CLAIMS:
V[N]: [claim]
  STATUS: VERIFIED_OBSERVED / VERIFIED_TESTED / VERIFIED_DERIVED
  EVIDENCE: [evidence]
  MARKER: [marker]

UNVERIFIED (EXCLUDED):
V[N]: [claim]
  REASON: [why verification failed]
  NOTE: [what would be needed to verify]

SUMMARY:
Total claims: [N]
Verified: [N] (Observed: [N], Tested: [N], Derived: [N])
Unverified (excluded): [N]
Verification rate: [%]

LOAD-BEARING UNVERIFIED:
[Any unverified claims that are load-bearing — these are critical gaps]

NOT Acceptable

These are NOT verification and must never appear:

“Confidence: LOW” (exclude instead of hedging)
“Needs validation” (validate it now or exclude)
“Probably true” (verify or exclude)
“Expert can fill in” (demonstrate with evidence first)
“Widely accepted” (accepted by whom? cite the source or exclude)
“Common knowledge” (not a verification status — cite or exclude)
“In my understanding” (understanding is not observation — cite or exclude)

Failure Modes

Failure	Signal	Fix
Hedged inclusion	”Low confidence” claims kept in output	Unverified = excluded. No exceptions
Memory-as-observation	”I know this” treated as [O]	Observation requires pointing to a specific, accessible source
Thought-experiment-as-test	”This would work” treated as [T]	Testing requires actual execution and recorded results
Incomplete derivation	Derivation skips premises or uses unverified ones	Every premise must be verified; every step explicit
Broad claims	”X is generally true” — too broad to verify	Decompose into specific, verifiable sub-claims
Authority-as-evidence	”Experts say X” without citing specific experts or claims	Name the expert, cite the source, or exclude

Depth Scaling

Depth	Scope	Output
1x	Quick — verify load-bearing claims only	Load-bearing claims verified or excluded
2x	Standard — all explicit claims verified	Full claim list, each verified or excluded
4x	Thorough — all claims including implicit ones, full derivation chains	Complete verification with explicit derivation chains
8x	Exhaustive — all claims, cross-verified where possible, alternative sources checked	Multiple verification paths per claim, highest confidence output

Pre-Completion Checklist

All claims extracted and numbered
Each claim attempted in order: Observed, Tested, Derived
Every [O] has a specific, accessible source
Every [T] has documented test conditions and results
Every [D] has complete derivation chain with verified premises
Unverified claims are EXCLUDED, not hedged
Load-bearing unverified claims are flagged as critical gaps
No “probably”, “likely”, “in my understanding”, or “widely accepted” in output

Integration

Use from: /araw (verify claims from ARAW output), /claim (verify the claim being tested), /create (verify factual claims in content before publishing), /fb (filtered feedback uses GOSM markers)
Routes to: /araw (when verification reveals claims that need stress-testing), /diagnose (when verification failures suggest the wrong thing is being measured)
Differs from: /av (assumption verification — tests beliefs, /ver tests factual claims), /val (deliverable validation against requirements, /ver is claim-by-claim verification), /vp (testing procedures for systems, /ver is for individual claims)
Complementary: /aex (surface hidden claims that need verification), /fb (filtered feedback requires GOSM grounding), /av (verify assumptions after verifying claims)

ver - GOSM Verification Procedure