Evaluation Dimensions
Input: $ARGUMENTS
Overview
Universal dimensions for evaluating any claim, problem, or solution. Use external grounding to assess claims on these dimensions when ARAW alone cannot determine them.
External grounding helps:
- Find problems/solutions ARAW wouldn’t find (blind spots)
- Verify ARAW outputs on these dimensions
- Calibrate confidence in ARAW findings
Steps
Step 1: Identify What’s Being Evaluated
- Is it a CLAIM (something asserted to be true)?
- Is it a PROBLEM (something needing solution)?
- Is it a SOLUTION (something proposed to address a problem)?
- Is it a PLAN (something proposed to achieve a goal)?
- Is it an OUTPUT (something produced)?
Step 2: Apply Universal Dimensions
Truth / Accuracy:
- Is it factually correct?
- Are the claims verifiable?
- What evidence supports it? What contradicts?
- What’s the confidence level?
- → Check with external sources, data, domain experts
Completeness:
- Does it address everything it should?
- What’s missing?
- Are edge cases covered?
- Are limitations acknowledged?
- → Compare against known frameworks, checklists, standards
Consistency:
- Does it contradict itself internally?
- Does it contradict established knowledge?
- Does it contradict other claims by the same source?
- → Cross-reference across the output and against known facts
Relevance:
- Does it address the actual question/problem?
- Is everything included actually necessary?
- Is the scope appropriate (not too broad, not too narrow)?
- → Compare the output against the original objective
Feasibility:
- Can it actually be done?
- With available resources?
- In the available time?
- Given real-world constraints?
- → Check against physical, financial, organizational, and technical reality
Robustness:
- Does it work under normal conditions only, or also under stress?
- What happens when assumptions are violated?
- How sensitive is it to small changes in inputs?
- → Stress-test with edge cases, adversarial inputs, changed assumptions
Originality / Value-Add:
- Does it say anything new or just restate known things?
- Does it provide insight beyond the obvious?
- Is the framing itself valuable even if content is known?
- → Compare against what’s already known/available
Actionability:
- Can someone act on this?
- Are next steps clear?
- Is it specific enough to guide action?
- → Ask: “What would I do differently after reading this?”
Fairness / Balance:
- Are all perspectives represented?
- Is evidence selectively presented?
- Are counterarguments acknowledged?
- → Look for what’s conspicuously absent
Clarity:
- Can the target audience understand it?
- Are terms defined?
- Is the structure logical?
- → Show to a representative reader and check comprehension
Step 3: Score Each Dimension
| Dimension | Score (1-5) | Key Evidence | Improvement Needed? |
|---|---|---|---|
| Truth/Accuracy | |||
| Completeness | |||
| Consistency | |||
| Relevance | |||
| Feasibility | |||
| Robustness | |||
| Originality | |||
| Actionability | |||
| Fairness | |||
| Clarity |
Not all dimensions are equally important for every evaluation. Weight by context:
- For claims: Truth, Consistency, and Fairness matter most
- For solutions: Feasibility, Robustness, and Actionability matter most
- For plans: Completeness, Feasibility, and Clarity matter most
- For outputs: Clarity, Relevance, and Accuracy matter most
Step 4: Identify Critical Gaps
- Which dimensions score below 3?
- For each low score: is this fixable or fundamental?
- Which low scores are most important given the evaluation type?
- What would it take to raise each critical dimension?
Step 5: Report
EVALUATION DIMENSIONS:
Subject: [what was evaluated]
Type: [claim/problem/solution/plan/output]
| Dimension | Score | Weight | Weighted | Key Finding |
|-----------|-------|--------|----------|-------------|
| [dim] | [1-5] | [H/M/L] | [score] | [finding] |
Overall: [weighted average or qualitative summary]
Critical gaps: [dimensions below 3 that matter]
Improvement path: [what would most improve the evaluation]
External grounding used: [what sources/methods supplemented ARAW]
ARAW blind spots found: [what external grounding caught that ARAW missed]
When to Use
- Evaluating any claim, solution, or output
- When ARAW has been applied but you want dimensional verification
- When you need to communicate evaluation results systematically
- → INVOKE: /cv (convergent validation) for multi-method verification
- → INVOKE: /evaluate (category skill) for routing to appropriate evaluation
Verification
- Evaluation type correctly identified
- All 10 dimensions considered
- Dimensions weighted by relevance to evaluation type
- External grounding used (not just internal reasoning)
- Critical gaps identified with improvement paths