Skill Evaluation
Input: $ARGUMENTS
Step 1: State the Skill’s Purpose
Define what the skill/procedure/method is supposed to accomplish.
SKILL: [Name or description of the skill]
STATED PURPOSE: [What it claims to do]
ACTUAL PURPOSE: [What it really does — may differ from stated]
SUCCESS CRITERIA: [What would a perfect execution look like?]
Step 2: Test on 3+ Inputs
Run the skill against diverse inputs to see how it performs.
TEST 1 — [input type: easy/typical/edge case]:
Input: [description]
Output: [what the skill produced]
Quality: [1-5] — [why]
TEST 2 — [input type]:
Input: [description]
Output: [what the skill produced]
Quality: [1-5] — [why]
TEST 3 — [input type]:
Input: [description]
Output: [what the skill produced]
Quality: [1-5] — [why]
AVERAGE QUALITY: [mean score]
VARIANCE: [How consistent is quality across inputs?]
Step 3: Assess Efficiency
Measure the cost of using this skill relative to its output.
TIME/EFFORT: [How much effort does the skill require?]
COMPLEXITY: [Is the skill harder to use than it needs to be?]
OVERHEAD: [Setup, prerequisites, or context needed before the skill works]
EFFICIENCY RATIO: [Output quality / effort required]
SIMPLIFICATION POSSIBLE: [Could the same result be achieved with fewer steps?]
Step 4: Identify Failure Modes
Find the ways this skill breaks, degrades, or misleads.
FAILURE MODES:
1. [Failure mode 1] — trigger: [what causes it] — severity: [HIGH/MEDIUM/LOW]
2. [Failure mode 2] — trigger: [what causes it] — severity: [HIGH/MEDIUM/LOW]
3. [Failure mode 3] — trigger: [what causes it] — severity: [HIGH/MEDIUM/LOW]
SILENT FAILURES: [Does the skill ever produce bad output that looks good?]
GRACEFUL DEGRADATION: [Does the skill fail obviously or subtly?]
Step 5: Compare to Alternatives
Benchmark against other ways to accomplish the same thing.
ALTERNATIVES:
1. [Alternative 1] — strengths vs. this skill: [what] — weaknesses: [what]
2. [Alternative 2] — strengths vs. this skill: [what] — weaknesses: [what]
WHEN THIS SKILL WINS: [Scenarios where this is the best option]
WHEN ALTERNATIVES WIN: [Scenarios where something else is better]
Step 6: Rate Overall Effectiveness
OVERALL RATING: [1-5]
STRENGTHS:
- [Strength 1]
- [Strength 2]
WEAKNESSES:
- [Weakness 1]
- [Weakness 2]
VERDICT: [KEEP AS-IS / IMPROVE / REPLACE / RETIRE]
IMPROVEMENT SUGGESTIONS: [If IMPROVE, what specifically should change]
Integration
Use with:
/cand-> Evaluate multiple skills as candidates for the same job/ctgp-> Check if the skill fills a gap or adds redundancy/ratn-> Build a rationale for keeping or retiring the skill