The Structure of Careful Thought

When you’re wrong about something important, how do you usually find out? If you’re like most people, the answer is: after the consequences arrive. You discover the flaw in your plan when it fails. You notice the gap in your argument when someone else points it out. You realize you missed an option after it’s no longer available.

This raises a puzzle. We’re not stupid. We often think carefully before acting. So why do we keep being surprised by things we “should have” considered?

Let’s start with a distinction that turns out to be important. A guess is any claim about the world that could be wrong. This includes predictions, explanations, assumptions, interpretations, and evaluations. Not everything is a guess though. Definitions aren’t guesses - “a bachelor is an unmarried man” can be clear or unclear, but it can’t be wrong about the world the way a prediction can. Procedures aren’t guesses. Observations mostly aren’t, though our interpretations of them are. Preferences aren’t - you can’t be wrong about liking coffee.

Why does this matter? Because guesses need testing. Non-guesses don’t. When we confuse the categories, we either waste effort testing things that don’t need testing, or we fail to test things that do. The first step toward careful thought is noticing what you’re treating as fact that is actually a guess.

Once we recognize that many of our beliefs are guesses, we face a question: how should we treat them? The natural answer is that we should test them. And we often think we do. But there’s a problem.

Watch how people actually engage with their guesses. When we believe something might be true, we ask why it could be true, what evidence supports it, what would follow if it’s right. We rarely ask with equal rigor why it could be false, what evidence contradicts it, what would follow if it’s wrong. This is the asymmetry problem. We give our current beliefs the advantage. We look for confirmation more thoroughly than we look for refutation.

This isn’t about being stupid or careless. It’s a default mode that operates even in smart, careful people. And here’s the thing that took me a while to understand: knowing about confirmation bias doesn’t fix it. Knowing about a bias doesn’t provide a mechanism for correction.

Consider what “trying to consider the other side” usually looks like. You have a position. You think you should consider objections. You generate one or two obvious objections. You find responses to them. You feel more confident in your position. What’s missing?

First, depth. Surface-level objections are easy to answer. The dangerous objections are ones you haven’t thought of yet, which require actually taking the opposing position seriously enough to find them. Second, symmetry. You’ve asked what the objections are, but have you asked what the world would look like if the objection were right with the same energy you’d ask what the world looks like if your position is right? Third, structure. “Trying” to consider objections is vague. Without a process that forces you to do specific things, defaults reassert themselves.

So what would a fix look like? Let’s reason from requirements. It would need to force symmetric treatment - whatever we do for “assume this is right,” we must equally do for “assume this is wrong.” It would need to force depth - can’t stop at surface level, must push until something surprising emerges. It would need to provide structure - not “try to think of objections” but specific operations. And it would need to be actually doable - can’t require infinite time or special training.

One approach that satisfies these requirements I’ve been calling ARAW, for Assume Right, Assume Wrong. The core idea is simple. For any claim you want to examine, you do two things.

First, assume it’s right. Take the claim as true and ask what evidence supports it, what follows if it’s true, what else must be true for this to hold, what you’d expect to see if it’s correct. Then assume it’s wrong. Take the claim as false and ask what evidence contradicts it, what follows if it’s wrong, what the failure mode is, what a better alternative might be. Then compare. The goal isn’t to “win” on either side but to have explored both with genuine effort.

The depth requirement matters. Don’t stop at level one. Each answer generates new claims. Apply the same process to those. Keep going until you hit something that can’t be further decomposed, or you find genuine uncertainty you can’t resolve, or diminishing returns set in. The structure forces you to actually do both sides. It’s not “try to think of objections” - it’s completing a template that doesn’t let you skip the uncomfortable parts.

Here’s an example. Say you’re evaluating a job candidate. The claim is “we should hire this person.”

Assuming right, at the first level you might note strong skills, good interview, comes recommended. But push deeper. Skills match your needs - but do they really? Interview performance predicts job performance - does it? Recommendations are trustworthy - are they? Go another level. Your needs assessment assumes current priorities continue. The interview format favors certain communication styles. The recommender has their own incentives. The surprise that emerged for me: we hadn’t actually tested whether our interview predicts job performance.

Assuming wrong, at the first level you might worry about overstated skills, culture fit issues. Push deeper. What would skill overstatement look like? Can you test for it? Culture fit is hard to assess - how are you actually assessing it? Go another level. Your culture assessment might be biased toward people similar to the current team. The surprise: you might be selecting for homogeneity while telling yourself you’re selecting for culture fit.

Neither side “won,” but you now have specific things to investigate that you didn’t have before.

ARAW is good for testing claims. But what if you’re missing claims to test? We often decide between options A and B without considering options C through Z. ARAW helps test whether A or B is better, but it doesn’t help if we never generated C.

This is where systematic exploration of the option space comes in - what I call universalization. Before evaluating, map what could be true. Use structured transformations: what states could this be in, what category is this an instance of and what are sibling categories, what if parameters varied, what if roles reversed, what if timing changed, what if scope expanded or contracted. These aren’t random brainstorming. They’re systematic transformations designed to find options that default thinking misses.

The combination of universalization and ARAW works like this. First universalize - map the space of possible claims or options. Then ARAW the top candidates rigorously. Then universalize again on what survived, looking for edge cases and boundary conditions. Then ARAW those edge cases to validate or refine.

The iteration matters. The first pass usually finds obvious things. The second pass - universalizing on what survived, then testing the edge cases - is where surprises emerge.

Several existing methods try to correct for biased thinking. Red teaming has a similar adversarial spirit, but it’s often role-based while ARAW asks you to genuinely inhabit both positions. Devil’s advocacy has the same idea of arguing the other side, but it’s often performative - you know you don’t believe it, so you don’t explore deeply. Pre-mortems do prospective failure analysis, but they’re asymmetric - you only imagine failure, not success. Steel-manning genuinely engages with opposing views, but typically only the “wrong” side. Decision matrices provide structured analysis, but they weight factors without testing whether the factors are correct.

The combination that seems to matter is forced symmetry where you can’t skip either side, forced depth where you can’t stop at surface, iteration that finds surprises in later passes, and universalization that finds options rather than just testing given ones.

If you want philosophical grounding, this connects to critical rationalism - Popper’s idea that theories gain strength by surviving attempts at falsification. ARAW is operationalized falsificationism for everyday thinking. It also connects to dialectics - thesis, antithesis, synthesis - where understanding emerges from tension between opposing positions. And to pragmatism, which asks what practical difference a belief makes - exactly what the assume-right and assume-wrong questions get at.

Some common objections. Isn’t this just overthinking? Sometimes, yes. Match method depth to decision stakes. Quick decisions can use a lighter version. High-stakes decisions merit full depth. How do you know when you’ve gone deep enough? Are you still finding surprises? Keep going. Are you just rephrasing earlier points? Stop. Have you hit things you genuinely can’t resolve? Note them and stop. Does this require training? It helps. The first few attempts feel mechanical. With practice it becomes natural.

What this doesn’t do: generate creativity (you still need ideas to test), replace domain expertise (you need knowledge to evaluate), work instantly (there’s a time cost), guarantee correctness (methodology can be applied poorly), or scale infinitely (at some point you’re over-analyzing). What it does do: force consideration you’d otherwise skip, surface assumptions you didn’t know you had, find objections before others find them for you, reduce the “I should have thought of that” failure mode.

The honest claim isn’t “this is how to think correctly.” It’s that this is a structure that, when applied, tends to produce more thorough consideration than unstructured thinking.

The real test is whether it helps you. Pick a belief you hold or a decision you’re facing. State the claim. Assume right - why might this be true? Go at least three levels. Assume wrong - why might this be false? Go at least three levels. Compare what you found. If the process surfaced something you wouldn’t have found otherwise, it’s doing its job. If it didn’t, either the claim was simple, you didn’t go deep enough, or the method doesn’t work for you.

The invitation isn’t to believe this is good. The invitation is to try it and evaluate.