Statistical Analysis
Input: $ARGUMENTS
Interpretations
Before executing, identify which interpretation matches the user’s input:
Interpretation 1 — Choosing and running a statistical test: The user has data and a question and needs help selecting the right test, checking assumptions, and interpreting results correctly. Interpretation 2 — Reviewing or critiquing an existing analysis: The user has results from a statistical analysis (their own or someone else’s) and wants to verify the methodology, check for errors, or assess whether conclusions are warranted. Interpretation 3 — Designing a study or analysis plan: The user has not yet collected data and needs help planning what to measure, what tests to use, and what sample sizes are needed to answer their research question.
If ambiguous, ask: “I can help with running a statistical test, reviewing an existing analysis, or designing an analysis plan — which fits?” If clear from context, proceed with the matching interpretation.
Steps
Step 1: Clarify the statistical question
Translate the research question into a statistical question:
TYPES OF STATISTICAL QUESTIONS:
-
COMPARISON QUESTIONS
- “Is there a difference between groups?”
- “Are means/proportions different?” Examples: Treatment vs. control, before vs. after
-
RELATIONSHIP QUESTIONS
- “Is there an association between variables?”
- “Does X predict Y?” Examples: Correlation, regression
-
PREDICTION QUESTIONS
- “Can we predict outcomes from predictors?”
- “How accurate are predictions?” Examples: Machine learning, forecasting
-
STRUCTURE QUESTIONS
- “What is the underlying structure?”
- “How do variables cluster?” Examples: Factor analysis, cluster analysis
SPECIFY:
- What is the outcome/dependent variable?
- What are the predictors/independent variables?
- Are you testing a specific hypothesis or exploring?
- What kind of answer do you need? (yes/no, magnitude, prediction)
CONFIRMATORY VS. EXPLORATORY:
- Confirmatory: Testing pre-specified hypothesis
- Requires pre-registration; controls Type I error
- Exploratory: Discovering patterns in data
- Generates hypotheses; results need replication
Be explicit about which mode you’re in.
Step 2: Characterize the data
Understand your data before selecting tests:
VARIABLE TYPES:
Categorical (qualitative):
- Nominal: Categories without order (e.g., treatment group, gender)
- Ordinal: Ordered categories (e.g., Likert scale, education level)
Numerical (quantitative):
- Continuous: Any value in range (e.g., time, weight, temperature)
- Discrete: Countable values (e.g., count of events)
For each variable, note:
- Type (nominal, ordinal, continuous, discrete)
- Role (outcome, predictor, covariate, grouping)
- Distribution (normal, skewed, bimodal)
- Missing data pattern and extent
DATA STRUCTURE:
-
Independence: Are observations independent?
- Independent: Different subjects, no clustering
- Paired/matched: Same subjects measured twice, or matched pairs
- Clustered: Subjects nested in groups (students in classrooms)
- Time series: Observations over time from same unit
-
Sample size per group
-
Balance: Equal or unequal group sizes?
PRELIMINARY EXAMINATION:
- Summary statistics (mean, SD, median, IQR)
- Frequency tables for categorical variables
- Histograms and boxplots for continuous variables
- Check for outliers and data entry errors
- Examine missing data patterns
Step 3: Select appropriate statistical test
Choose the test that matches your question and data:
COMPARING TWO GROUPS:
| Outcome Type | Independent Groups | Paired/Matched |
|---|---|---|
| Continuous | Independent t-test | Paired t-test |
| Ordinal | Mann-Whitney U | Wilcoxon signed-rank |
| Categorical | Chi-square/Fisher | McNemar’s test |
COMPARING THREE+ GROUPS:
| Outcome Type | Independent Groups | Repeated Measures |
|---|---|---|
| Continuous | One-way ANOVA | Repeated-measures ANOVA |
| Ordinal | Kruskal-Wallis | Friedman test |
| Categorical | Chi-square | Cochran’s Q |
EXAMINING RELATIONSHIPS:
| Predictor(s) | Outcome Type | Test |
|---|---|---|
| Continuous | Continuous | Pearson correlation, linear regression |
| Continuous | Binary | Logistic regression |
| Continuous | Count | Poisson regression |
| Multiple | Continuous | Multiple regression |
| Multiple | Binary | Multiple logistic regression |
SPECIAL CASES:
- Clustered data: Mixed-effects/multilevel models
- Time series: Time series methods, repeated measures
- Survival/duration: Survival analysis (Kaplan-Meier, Cox)
- Multiple outcomes: MANOVA, structural equation modeling
DECISION FACTORS:
- Type of outcome variable (determines test family)
- Number of groups/predictors
- Independence structure of observations
- Sample size (parametric vs. non-parametric)
- Assumption satisfaction
When in doubt:
- Simpler methods often more robust
- Non-parametric methods when assumptions violated
- Consult statistician for complex designs
Step 4: Check assumptions
Verify that test assumptions are satisfied:
PARAMETRIC TEST ASSUMPTIONS:
-
NORMALITY Check: Histogram, Q-Q plot, Shapiro-Wilk test
- Exact normality rarely required
- Central Limit Theorem helps with n > 30 per group
- More important for small samples Violation remedy: Transform data; use non-parametric test
-
HOMOGENEITY OF VARIANCE Check: Levene’s test, F-max test, visual inspection
- Groups should have similar variances
- More important with unequal group sizes Violation remedy: Welch’s t-test; transformed data; robust SE
-
INDEPENDENCE Check: Study design review
- Observations should be independent
- Most critical assumption Violation remedy: Use paired/clustered methods
-
LINEARITY (for regression) Check: Residual plots, scatterplots
- Relationship should be linear Violation remedy: Transform variables; polynomial terms
-
HOMOSCEDASTICITY (for regression) Check: Residual vs. fitted plot
- Variance should be constant across predicted values Violation remedy: Robust standard errors; weighted regression
REPORTING ASSUMPTION CHECKS:
- Report what was checked and how
- Report results of assumption tests
- Describe remedies applied if assumptions violated
- Consider sensitivity analysis with alternative methods
ROBUST ALTERNATIVES:
- Welch’s t-test (doesn’t assume equal variance)
- Non-parametric tests (don’t assume normality)
- Robust regression (handles outliers)
- Bootstrapping (makes minimal assumptions)
Step 5: Conduct the analysis
Execute the statistical analysis:
-
RUN THE ANALYSIS
- Use appropriate software (R, Python, SPSS, Stata)
- Double-check data entry and coding
- Verify degrees of freedom match expectation
- Save code/syntax for reproducibility
-
RECORD KEY STATISTICS
For hypothesis tests:
- Test statistic (t, F, chi-square, z, etc.)
- Degrees of freedom
- P-value (exact, not just < .05)
- Sample size (per group if applicable)
For effect sizes:
- Point estimate (d, r, OR, RR, etc.)
- 95% confidence interval
- Interpret magnitude (small, medium, large)
For regression:
- Coefficients with standard errors
- Confidence intervals
- Model fit (R-squared, AIC, etc.)
- Residual diagnostics
-
EFFECT SIZE CALCULATION
For mean differences:
- Cohen’s d = (M1 - M2) / pooled SD
- 0.2 = small, 0.5 = medium, 0.8 = large
- Hedges’ g (corrects for small sample bias)
For correlations:
- Pearson’s r (or Spearman’s rho)
- 0.1 = small, 0.3 = medium, 0.5 = large
- R-squared (proportion of variance explained)
For categorical outcomes:
- Odds ratio (OR)
- Risk ratio/Relative risk (RR)
- Number needed to treat (NNT)
For ANOVA:
- Eta-squared or partial eta-squared
- Omega-squared (less biased)
- Cohen’s d = (M1 - M2) / pooled SD
-
CONFIDENCE INTERVALS
- Always report CIs for effect sizes
- 95% CI most common (corresponds to alpha = .05)
- Interpret: Range of plausible population values
- If CI excludes zero/one, effect is “significant”
Step 6: Interpret results correctly
Translate statistical results into meaningful conclusions:
INTERPRETING P-VALUES:
What p-value IS:
- Probability of data (or more extreme) IF null hypothesis true
- Measure of evidence against H0
What p-value IS NOT:
- Probability that H0 is true
- Probability that results are due to chance
- Measure of effect size or importance
- Probability of replication
Common thresholds (arbitrary but conventional):
- p < .05: “Statistically significant”
- p < .01: “Highly significant”
- p < .001: “Very highly significant”
Better practice:
- Report exact p-values (p = .032, not p < .05)
- Focus on effect size and CI, not just significance
- Consider p-value in context of power and prior probability
INTERPRETING EFFECT SIZES:
Cohen’s conventions (context-dependent):
- Small: d = 0.2, r = 0.1
- Medium: d = 0.5, r = 0.3
- Large: d = 0.8, r = 0.5
Better approach:
- Compare to prior research in the field
- Consider practical/clinical significance
- Use domain knowledge to interpret magnitude
INTERPRETING CONFIDENCE INTERVALS:
95% CI interpretation:
- “We are 95% confident the true value is in this range”
- If CI for difference excludes zero: significant difference
- Narrow CI: Precise estimate; Wide CI: Imprecise estimate
What CI tells you that p-value doesn’t:
- Magnitude of effect (not just direction)
- Precision of estimate
- Range of plausible values
NON-SIGNIFICANT RESULTS:
“Not significant” does NOT mean:
- No effect exists
- Effect is zero
- Null hypothesis is true
It DOES mean:
- Cannot reject H0 with this sample
- Effect may exist but study underpowered
- Evidence is inconclusive
Report: Effect size, CI, and power to detect meaningful effect
Step 7: Address multiple testing and report fully
Handle multiple comparisons and report transparently:
MULTIPLE TESTING PROBLEM:
- Each test at alpha = .05 has 5% false positive rate
- 20 tests: expect 1 false positive by chance
- Family-wise error rate increases rapidly
CORRECTION METHODS:
-
Bonferroni correction
- Adjusted alpha = 0.05 / number of tests
- Conservative; reduces power
- Use when: Small number of planned tests
-
Holm-Bonferroni (step-down)
- Less conservative than Bonferroni
- Controls family-wise error
- Use when: Multiple planned comparisons
-
False Discovery Rate (FDR)
- Benjamini-Hochberg procedure
- Controls proportion of false positives
- Use when: Many tests (e.g., genomics)
-
No correction (with justification)
- Pre-registered primary analysis
- Clearly labeled exploratory analyses
- Replication planned
WHEN TO CORRECT:
- Multiple outcomes on same hypothesis
- Multiple subgroup analyses
- Post-hoc pairwise comparisons
WHEN CORRECTION MAY NOT BE NEEDED:
- Single pre-registered primary outcome
- Clearly labeled exploratory analyses
- Independent research questions
TRANSPARENT REPORTING:
Report:
- All analyses conducted (not just significant ones)
- How analyses were specified (pre-registered or post-hoc)
- Any corrections applied for multiple testing
- Exact p-values, effect sizes, and confidence intervals
- Sample sizes and degrees of freedom
- Assumption checks and any violations
- Software and version used
Follow reporting guidelines:
- APA style for psychology
- CONSORT for clinical trials
- STROBE for observational studies
Step 8: Document limitations and conclusions
Identify statistical limitations and draw appropriate conclusions:
COMMON STATISTICAL LIMITATIONS:
-
POWER LIMITATIONS
- Small sample may miss real effects
- Report achieved power for observed effect
- Non-significant ≠ no effect
-
ASSUMPTION VIOLATIONS
- Which assumptions were questionable?
- How might this affect conclusions?
- Did robust methods help?
-
MISSING DATA
- How much was missing?
- Was missingness random or systematic?
- How was it handled?
-
MEASUREMENT ISSUES
- Reliability of measures
- Validity concerns
- Measurement error implications
-
GENERALIZABILITY
- Sample representativeness
- Context specificity
- Replication needs
APPROPRIATE CONCLUSIONS:
DO:
- Conclude about population parameters
- Distinguish statistical from practical significance
- Acknowledge uncertainty (CIs, p-values)
- Note limitations on causal inference
- Suggest replication and future directions
DON’T:
- Overstate certainty
- Treat non-significant as “no effect”
- Confuse correlation with causation
- Generalize beyond sample characteristics
- Make causal claims from observational data
FINAL CHECKLIST:
- Research question answered?
- Appropriate test used?
- Assumptions checked?
- Effect size reported with CI?
- Multiple testing addressed?
- Limitations acknowledged?
- Conclusions proportionate to evidence?
When to Use
- Analyzing data from experiments or observational studies
- Testing hypotheses with quantitative data
- Comparing groups or examining relationships
- Building predictive or explanatory models
- Evaluating program or intervention effectiveness
- Making data-driven decisions requiring statistical evidence
- Reviewing or critiquing statistical analyses
Verification
- Statistical question clearly specified
- Test selection matches data type and research question
- All assumptions checked and violations addressed
- Effect sizes reported with confidence intervals
- P-values correctly interpreted (not over-interpreted)
- Multiple testing addressed if applicable
- Limitations acknowledged
- Analysis is reproducible