Data & Quantitative Analysis
Input: $ARGUMENTS
Step 1: Clarify the Question
Data analysis without a clear question produces noise, not signal.
- What specific question should the data answer?
- What decision will this analysis inform?
- What would a useful answer look like? (format, precision, timeframe)
- What would change if the answer were X vs. Y?
QUESTION: [specific, answerable question]
DECISION IT INFORMS: [what action depends on this]
USEFUL ANSWER FORMAT: [e.g., "a percentage with confidence interval" or "a ranked list"]
If the question is vague (“analyze our data”), push back. A question like “are sales trending up?” is answerable. “Tell me about sales” is not.
Step 2: Assess Data Availability and Quality
Before analysis, understand what you’re working with.
- What data exists? List available datasets/sources
- What’s missing? What data would you ideally have but don’t?
- Quality checks:
- Completeness: What percentage of values are missing?
- Accuracy: How was the data collected? Any known errors?
- Timeliness: How current is the data?
- Consistency: Are definitions and units consistent across the dataset?
- Representativeness: Does the data represent the population you’re asking about?
DATA AVAILABLE: [list of sources/datasets]
DATA MISSING: [what you wish you had]
QUALITY ASSESSMENT:
Completeness: [good / gaps / major gaps]
Accuracy: [high / moderate / uncertain]
Timeliness: [current / dated / stale]
Consistency: [consistent / some issues / inconsistent]
Representativeness: [representative / biased / unknown]
PROCEED? [yes / yes with caveats / no — data insufficient]
If data quality is poor, state what conclusions CANNOT be drawn before proceeding. Poor data acknowledged is better than poor data ignored.
Step 3: Choose Analysis Method
Match the method to the question type:
| Question Type | Methods |
|---|---|
| ”How much/many?” | Descriptive stats (mean, median, range, distribution) |
| “Is there a difference?” | Comparison (t-test, ANOVA, chi-square, effect sizes) |
| “Is there a relationship?” | Correlation, regression |
| ”Is this trend real?” | Time series, trend tests, regression with time |
| ”What predicts X?” | Regression, classification, feature importance |
| ”Are these groups different?” | Clustering, segmentation |
| ”Is this unusual?” | Outlier detection, anomaly scores, z-scores |
CHOSEN METHOD: [method]
WHY: [matches the question because...]
ASSUMPTIONS REQUIRED: [what the method assumes about the data]
Check method assumptions against actual data. If assumptions are violated, note this and either choose a different method or proceed with documented caveats.
Step 4: Perform the Analysis
Execute the chosen method. Document:
- Setup: Any data transformations, filters, or exclusions applied
- Core results: The numbers that answer the question
- Supporting details: Sample sizes, confidence intervals, p-values, effect sizes
- Visualizations: Describe what chart/graph would best communicate the finding
RESULTS:
- [primary finding with numbers]
- [confidence interval or uncertainty range]
- [sample size / degrees of freedom]
- [effect size if applicable]
BEST VISUALIZATION: [chart type and what it would show]
Step 5: Check for Statistical Traps
Before trusting results, audit for common errors:
| Trap | Check | Status |
|---|---|---|
| Small sample | N >= 30 for parametric tests? Power adequate? | [ok / concern] |
| Multiple comparisons | Testing many hypotheses without correction? | [ok / concern] |
| P-hacking | Did you test multiple analyses and report the best? | [ok / concern] |
| Confounders | Could a third variable explain the relationship? | [ok / concern] |
| Survivorship bias | Are you only seeing data that “survived” some filter? | [ok / concern] |
| Simpson’s paradox | Does the trend reverse when you split by subgroups? | [ok / concern] |
| Base rate neglect | Are you ignoring how common/rare the phenomenon is? | [ok / concern] |
| Cherry-picked timeframe | Would different start/end dates change the conclusion? | [ok / concern] |
| Correlation as causation | Are you implying X causes Y from correlation alone? | [ok / concern] |
TRAPS CHECKED: [count] of 9
CONCERNS FOUND: [list any concerns and their implications]
For each concern: state how it affects the conclusion and whether it can be mitigated.
Step 6: Interpret Conservatively
Translate results into plain language, erring on the side of caution.
- State what the data DOES support
- State what the data does NOT support (even if tempting to claim)
- Rate confidence: HIGH / MODERATE / LOW
- State the uncertainty range in plain language
FINDING: [plain-language statement of what the data shows]
CONFIDENCE: [HIGH / MODERATE / LOW]
UNCERTAINTY: [what the range of plausible truths is]
DOES NOT SHOW: [what you cannot conclude from this data]
Step 7: Communicate Findings
Structure the final communication:
- Headline: One sentence, the key finding
- Context: What was analyzed and why
- Key numbers: 2-3 numbers that tell the story (not a data dump)
- Caveats: What limits the conclusion (briefly)
- Recommendation: What action the finding supports
REPORT:
HEADLINE: [one-sentence finding]
CONTEXT: [what was analyzed, data source, timeframe]
KEY NUMBERS:
- [number 1 with context]
- [number 2 with context]
CAVEATS: [most important limitation]
RECOMMENDATION: [what to do based on this]
Integration
Use with:
/ht-> Formally test a hypothesis the data suggests/dcp-> Feed quantitative findings into a decision/cba-> Use data to populate a cost-benefit analysis/spec-> Explore what the data might mean if trends continue