About
The philosophy behind the toolkit. Why alternation works, how it compares to other approaches, and what's proven vs. claimed.
The problem
There are two fundamentally different operations in reasoning:
- Exploration: What options exist? What haven't I considered? What's the full space of possibilities? This is divergent — it expands your view.
- Testing: Is this actually true? What happens if it's wrong? What survives scrutiny? This is convergent — it narrows your view.
Most tools do one or the other. Brainstorming tools explore but don't test. Debate tools test but don't explore. Chain-of-thought does a bit of both but neither systematically. And none of them alternate — exploring, then testing what you found, then exploring the edge cases of what survived, then testing again.
The alternation matters because the two operations have structural blind spots that don't overlap. Exploration alone finds options but can't tell you which ones work. Testing alone validates what you're looking at but can't tell you what you're not looking at. Alternating covers both.
The core methods
ARAW (Assume Right / Assume Wrong)
Takes any claim and branches it: what follows if true? What follows if false? Then recurses — every conclusion is another claim. The result is a tree of tested claims stored in SQLite, with leverage scores, crux detection, and domain classification.
ARAW is operationalized Popperian falsification for everyday thinking. It forces symmetric treatment — whatever you do for "assume right," you must equally do for "assume wrong." This addresses confirmation bias not by knowing about it, but by having a process that forces the corrective operation.
UAUA (Universalize → ARAW → Universalize → ARAW)
Alternates between two mathematically distinct search operations:
Universalization operates on N-valued type theory. It asks "what is this an instance of?" and derives all instances — searching horizontally across the possibility space.
ARAW operates on binary Boolean logic. It asks "is this true or false?" and eliminates branches through contradiction — searching vertically into depth.
U1: Map the space (divergent, N-valued) → candidates
↓
A1: Test candidates (convergent, binary) → validated/rejected
↓
U2: Find edge cases of survivors (divergent) → new candidates
↓
A2: Final validation (convergent) → what survived all rounds Each pass uses a fundamentally different logic — type enumeration vs. binary elimination — so their blind spots don't overlap. Information-theoretically, each ARAW pass maximizes entropy reduction (selecting the crux that most constrains remaining uncertainty), while each universalization pass maximizes entropy expansion (finding dimensions not yet explored).
207 structured skills
Each skill is a structured prompt that guides you through a specific type of thinking. Skills are classified by universality:
Universal skills apply whenever their trigger condition exists — their applicability is entailed by the problem structure. Heuristic skills work well in specific contexts, but whether they apply to your context requires judgment.
The skill tiers loosely reflect this. Tier 1 skills tend to be universal. Tier 4 skills tend to be context-dependent.
For the full philosophical treatment, see Universal Principles of Mathematical Problem Solving →
How it compares
| reasoningtool | AI agents | Prompt libraries | Chain-of-thought | |
|---|---|---|---|---|
| Structured exploration | Yes (universalization) | No | No | Implicit |
| Systematic testing | Yes (ARAW) | No | No | Implicit |
| Alternation between both | Yes (UAUA) | No | No | No |
| Persistence across sessions | Yes (SQLite) | Varies | No | No |
| Interactive visualization | Yes (Sigma.js) | No | No | No |
| Cross-run synthesis | Yes | No | No | No |
| Classifies skills by universality | Yes (tiered) | No | Some | No |
| Philosophy of why it works | Yes (essays) | No | No | No |
The difference isn't features — someone could add any of these to an existing framework. The difference is design intent. Agents execute tasks. Prompt libraries produce better outputs. This toolkit checks whether you're solving the right problem before you solve it.
What's proven and what's claimed
Proven (by the structure itself):
- If you test both "assume right" and "assume wrong," you will consider alternatives you wouldn't have otherwise. This is logically necessary — the operation forces it.
- If you alternate between exploration and testing, you cover failure modes that either operation alone misses. This follows from the fact that their blind spots are complementary.
Claimed (based on principles, not yet empirically validated):
- The alternation produces better decisions than non-alternating methods.
- The universal/heuristic skill classification helps users pick the right tools faster.
- The full UAUA cycle (4 passes) is meaningfully better than a 2-pass version.
This is experimental. The philosophy is grounded in epistemology and information theory. The implementation is v1. If you find where it breaks, that's valuable — open an issue.
License
Apache 2.0. Use it, modify it, build on it.