Filtered Feedback Generation
Input: $ARGUMENTS
Interpretations
Before executing, identify which interpretation matches the user’s input:
Interpretation 1 — Session review: Review the current session’s outputs and generate high-quality feedback items that can be fed back into the system for improvement. Interpretation 2 — Output quality filter: The user has a set of findings or conclusions and wants them filtered to only keep the well-grounded, high-leverage ones. Interpretation 3 — Feedback loop design: The user wants to design a feedback mechanism that prevents error accumulation over multiple iterations.
If ambiguous, ask: “Do you want me to review this session for feedback, filter existing findings for quality, or design a feedback loop?” If clear from context, proceed with the matching interpretation.
Core Principles
-
Feedback loops amplify errors unless filtered. Feeding raw output back into a system compounds errors. Each iteration can drift further from reality. Filtering ensures only well-grounded items propagate.
-
Leverage determines priority. Not all feedback is equal. High-leverage feedback — items that are high-value, defensible, and broadly applicable — deserves attention. Low-leverage feedback is noise.
-
Grounding prevents hallucination loops. Every accepted feedback item must have a GOSM marker ([O], [T], or [D]). Items without grounding are opinions, not findings. Feeding opinions back as inputs creates self-reinforcing illusions.
-
Convergence validates. When multiple independent analysis paths arrive at the same conclusion, confidence increases. Single-path conclusions are fragile.
-
Fixed points are stable. A finding that survives re-analysis without changing is a fixed point. Fixed points are more trustworthy than findings that shift with each examination.
-
Rejection is the primary output. Most potential feedback items should be rejected. If most items pass, the filter is too loose. Strict filtering prevents gradual quality decay.
Filtering Criteria
1. Leverage Scoring
LEVERAGE = value x defensibility x scalability
value: Impact if resolved (0-1)
0.0 = no impact
0.3 = minor improvement
0.5 = moderate improvement
0.7 = significant impact
1.0 = transformative
defensibility: Protected from invalidation (0-1)
0.0 = easily overturned
0.3 = weak evidence
0.5 = moderate evidence
0.7 = strong evidence, multiple sources
1.0 = definitively established
scalability: Broadly applicable (0-1)
0.0 = one-time use only
0.3 = applicable to similar situations
0.5 = applicable across domains
0.7 = general principle
1.0 = universal
Minimum leverage for acceptance: 0.125 (e.g., 0.5 x 0.5 x 0.5)
2. Selection Filters
Before accepting, check:
- Implementation readiness: feasibility > 0.3 (can this actually be acted on?)
- Risk tolerance: high-risk items need proportionally higher leverage
- Reversibility: irreversible changes need stronger validation than reversible ones
3. Convergent Validation
Four independent checks per item:
| Check | Question | Pass Criterion |
|---|---|---|
| is_grounded | Does it have an [O], [T], or [D] marker? | Has specific evidence, not just assertion |
| is_fixed_point | Is it stable under re-analysis? | Same conclusion reached on second pass |
| is_convergent | Do multiple paths lead here? | At least 2 independent reasoning paths |
| is_practical | Does it pass real-world filters? | Can be implemented, doesn’t violate constraints |
Decision Protocol:
- 4/4 checks pass → ACCEPT with high confidence
- 3/4 checks pass → ACCEPT with moderate confidence
- 2/4 checks pass → FLAG for review
- <2 checks pass → REJECT
Procedure
Step 1: Identify Candidate Feedback Items
Review the session outputs. For each potential feedback item:
- State it clearly as a single item
- Classify its type: goal / problem / question / decision / assumption / finding / principle
- Note where it came from in the session
Step 2: Score Each Item
For each candidate:
ITEM: [text]
TYPE: [goal | problem | question | decision | assumption | finding | principle]
SOURCE: [where in the session]
LEVERAGE:
value: [0-1] — [rationale]
defensibility: [0-1] — [rationale]
scalability: [0-1] — [rationale]
LEVERAGE SCORE: [product]
CONVERGENT VALIDATION:
is_grounded: [PASS/FAIL] — [evidence]
is_fixed_point: [PASS/FAIL] — [re-analysis result]
is_convergent: [PASS/FAIL] — [paths that lead here]
is_practical: [PASS/FAIL] — [implementation assessment]
CONVERGENT SCORE: [0-4]
SELECTION FILTERS:
feasibility: [0-1]
risk level: [low/medium/high]
reversibility: [reversible/costly/irreversible]
Step 3: Apply Filters
Categorize each item:
ACCEPTED (feed back into system):
TYPE: [type]
CONTENT: [the item]
LEVERAGE: [score]
CONVERGENT_SCORE: [0-4]
GROUNDING: [O/T/D marker with evidence]
CONFIDENCE: [high (4/4) | moderate (3/4)]
FLAGGED (needs review): Items with 2/4 convergent checks passing. List for optional human review.
REJECTED (do not feed back): Items that failed filtering. Excluded to prevent error accumulation. Briefly note why.
Step 4: Format for Reuse
Format accepted items as inputs for future sessions:
- Goals → can feed into /want
- Problems → can feed into /diagnose
- Questions → can feed into /claim or /search
- Decisions → can feed into /decide
- Assumptions → can feed into /av
- Findings → can feed into /araw for further testing
- Principles → can feed into future analysis as constraints
Failure Modes
| Failure | Signal | Fix |
|---|---|---|
| Loose filter | >50% of items accepted | Tighten — most items should be rejected |
| Ungrounded acceptance | Items accepted without [O/T/D] marker | Grounding is mandatory — no exceptions |
| Echo chamber | Accepted items all confirm prior conclusions | Check for convergence from independent paths, not repeated paths |
| Leverage inflation | Everything scored as high-value | Calibrate: most items are moderate-value at best |
| Fixed-point illusion | Item “survives” re-analysis because you just agreed with yourself | Re-analysis must be genuinely adversarial |
| Practicality blindness | Theoretically sound items that can’t be implemented | Practical filter is not optional — if it can’t be acted on, it’s not useful feedback |
Depth Scaling
| Depth | Scope | Output |
|---|---|---|
| 1x | Quick — score top 5 items, accept/reject | Top items filtered, brief report |
| 2x | Standard — all items scored, full convergent validation | Complete filtering with rationale for each |
| 4x | Thorough — all items with full scoring, re-analysis for fixed-point check, formatted for reuse | Complete report with reusable feedback items |
| 8x | Exhaustive — all items, multiple re-analysis passes, cross-session convergence check | Maximum-quality filtered feedback with provenance tracking |
Pre-Completion Checklist
- All candidate items extracted from session
- Each item has leverage score with rationale
- Each item has convergent validation (all 4 checks)
- Selection filters applied (feasibility, risk, reversibility)
- Accepted items have [O/T/D] grounding markers
- Rejected items have brief rejection reason
- Acceptance rate is reasonable (<50% of candidates)
- Accepted items formatted for reuse in future sessions
Integration
- Use from: Any session that produces findings worth preserving. Typically run at session end or after major analytical skill chains.
- Routes to: /want (goals), /diagnose (problems), /claim or /search (questions), /decide (decisions), /av (assumptions), /araw (findings for further testing)
- Differs from: /ver (verifies individual claims, /fb filters session-level feedback), /val (validates deliverables, /fb validates feedback items), /evaluate (assesses work quality, /fb assesses feedback quality)
- Complementary: /ver (GOSM grounding markers feed into /fb’s grounding check), /araw (stress-test accepted items further), /prr (review the feedback process itself)