Tier 4

imprt - Improve Reasoning Tool

Improve Reasoning Tool

Input: $ARGUMENTS


Core Principles

  1. The toolkit is a system, not a collection. Individual skill quality matters, but the system-level properties matter more: coverage (are all thinking modes represented?), routing (can users find the right skill?), integration (do skills chain properly?), consistency (do skills follow the same standard?). Improving one skill while the routing is broken helps no one.

  2. Improvement has layers. From most impactful to least: (a) missing capabilities that users need, (b) broken routing that sends users to wrong skills, (c) quality gaps in high-use skills, (d) quality gaps in low-use skills, (e) formatting inconsistencies. Attack in this order.

  3. The toolkit should self-diagnose. This skill is the toolkit examining itself. The diagnostic must be honest — identifying real weaknesses, not performing a flattering self-assessment. The hardest improvements to identify are the ones the toolkit is systematically blind to.

  4. User friction is the primary signal. If users can’t find the right skill, that’s worse than a skill being slightly under-polished. If the routing skills (wsib, fonss, extract, handle) don’t know about newer skills, the newer skills effectively don’t exist.

  5. The exemplar skills define quality, not the average. Don’t measure the toolkit against its average skill quality. Measure against the best skills (foht, sbfow, iterate, araw, w). Every skill should be within striking distance of the exemplars.


Phase 1: System-Level Diagnostic

Evaluate the toolkit as a whole before looking at individual skills.

Coverage Audit

[R1] THINKING_MODES_REPRESENTED:
     Decision making: [skills] — COVERAGE: [strong | adequate | weak | missing]
     Problem solving: [skills] — COVERAGE: [level]
     Exploration: [skills] — COVERAGE: [level]
     Validation: [skills] — COVERAGE: [level]
     Planning: [skills] — COVERAGE: [level]
     Writing: [skills] — COVERAGE: [level]
     Self-correction: [skills] — COVERAGE: [level]
     Meta-cognition: [skills] — COVERAGE: [level]
     Skill management: [skills] — COVERAGE: [level]
     [other modes discovered]: [skills] — COVERAGE: [level]

[R2] COVERAGE_GAPS: [thinking modes with weak or missing representation]

Routing Audit

[R3] ROUTING_SKILLS: [list all skills that route to other skills]
[R4] ROUTING_COVERAGE: [what % of skills are reachable via routing?]
[R5] ORPHAN_SKILLS: [skills not referenced by any routing skill or integration section]
[R6] ROUTING_STALENESS: [routing skills that don't know about newer skills]

Integration Audit

[R7] INTEGRATION_MAP: [sample of skill chains — do they work?]
[R8] BROKEN_CHAINS: [skills that invoke non-existent skills]
[R9] MISSING_INTEGRATION: [skills with no integration section]

Quality Distribution

[R10] QUALITY_DISTRIBUTION:
      Exemplar (300+ lines, all elements, dense content): [N skills]
      Good (150-300 lines, all elements): [N skills]
      Adequate (100-150 lines, most elements): [N skills]
      Partial (50-100 lines, missing elements): [N skills]
      Stub (< 50 lines): [N skills]

Phase 2: Individual Skill Scan

Sample skills from each quality tier and diagnose:

[R-N] SKILL: /[name]
     TIER: [exemplar | good | adequate | partial | stub]
     ISSUES: [specific problems found]
     USAGE: [how many other skills reference it]
     IMPROVEMENT_PRIORITY: [critical | high | medium | low | skip]

Scan Strategy

  • Read ALL stubs and partials (they’re short)
  • Sample 20% of adequate skills
  • Spot-check 10% of good skills
  • Read all routing and category skills fully (they’re system-critical)

Phase 3: Systemic Issue Detection

[R-N] SYSTEMIC_ISSUE: [pattern across multiple skills]
     AFFECTED_COUNT: [how many skills]
     SEVERITY: [how much this degrades the toolkit]
     ROOT_CAUSE: [why this pattern exists — era effect, generation method, etc.]
     FIX: [systemic fix, not per-skill]

Common Systemic Issues

IssueSignalSystemic Fix
Era quality gapsSkills from one date are systematically worseBatch improvement via /impss
Routing stalenessNew skills not in routing tablesUpdate all routing skills’ integration sections
Inconsistent structureDifferent skills use different section names/formatsEstablish template, normalize
Missing cross-referencesSkills that should reference each other don’tMap skill relationships, update integration sections
Category imbalance50 decision skills but 3 writing skillsCreate new skills in underrepresented areas via /skgap
Naming inconsistencySome skills use abbreviations, others full wordsEstablish naming convention
Depth inconsistencySome skills have 8x depth, others have no scalingAdd depth scaling to all

Phase 4: Improvement Roadmap

IMPROVEMENT ROADMAP
===================

CRITICAL (do now):
  1. [specific action] — IMPACT: [what this fixes]
  2. [specific action] — IMPACT: [what this fixes]

HIGH (do soon):
  1. [specific action] — IMPACT: [what this fixes]

MEDIUM (do when able):
  1. [specific action]

LOW (backlog):
  1. [specific action]

RECOMMENDED SKILL CREATION:
  1. /[name] — [what it would do] — WHY: [gap it fills]
  2. /[name] — [what it would do] — WHY: [gap it fills]

RECOMMENDED SKILL IMPROVEMENTS:
  → INVOKE: /impss [list of skills needing improvement]

RECOMMENDED GAP ANALYSIS:
  → INVOKE: /skgap [areas identified as underrepresented]

Failure Modes

FailureSignalFix
Flattering self-assessment”The toolkit is comprehensive and well-structured” without identifying real issuesBe adversarial. What would a frustrated user complain about?
Individual focus onlyLooked at skills one by one but missed system-level issuesStart with system diagnostics (coverage, routing, integration) before individual skills
Quantity metrics only”We have 400+ skills!” without quality assessmentCount skills by quality tier, not just total
Improvement without priorityList of 50 improvements with no orderingEverything gets a priority level. Critical/high get done. Low goes to backlog
Missing the user perspectiveAll improvements are structural/internal, none address usabilityAsk: “Can a new user find the right skill for their problem in < 30 seconds?”
Scope blindnessOnly evaluates what exists, not what’s missingCoverage audit must identify ABSENT thinking modes, not just assess present ones

Depth Scaling

DepthSystem AuditSkill ScanPattern DetectionRoadmap
1xCoverage + routing onlyStubs onlyTop 3 patternsCritical actions only
2xFull system auditAll stubs + partials + routing skillsAll patternsFull roadmap
4xFull + user journey simulation50% of all skillsFull + root causeFull + timeline
8xFull + competitive analysisAll skillsFull + predictionFull + implementation plan

Default: 2x. These are floors.


Pre-Completion Checklist

  • Coverage audit completed — all thinking modes assessed
  • Routing audit completed — orphan and stale routing identified
  • Integration audit completed — broken chains found
  • Quality distribution mapped across all tiers
  • Systemic issues identified with root causes
  • Improvement roadmap has clear priority ordering
  • Missing skills identified (not just improvements to existing)
  • User-facing issues prioritized over internal quality issues

Integration

  • Use from: toolkit maintenance, periodic quality reviews, after large skill additions
  • Routes to: /impss (batch skill improvement), /skgap (skill gap analysis), /imps (single skill improvement)
  • Complementary: /skgap (imprt finds all issues; skgap specifically finds missing capabilities)
  • Differs from /imps: imps improves one skill; imprt diagnoses the whole toolkit
  • Differs from /impss: impss executes batch improvements; imprt identifies what needs improving
  • Differs from /skgap: skgap finds missing skills; imprt assesses the toolkit holistically