Tier 4

dtsk - Data Analysis

DTSK - Data Analysis

Input: $ARGUMENTS


Step 1: Identify What Data Exists

DATA INVENTORY:
  1. [data source/set] — FORMAT: [type] — SIZE: [approximate]
     CONTAINS: [what variables or fields]
     TIMEFRAME: [period covered]
  2. [data source/set] — FORMAT: [type]
     CONTAINS: [variables]
     TIMEFRAME: [period]

Don’t assume data exists just because it should. Inventory only what’s actually available and accessible.


Step 2: Assess Data Quality

For each data source:

QUALITY ASSESSMENT: [data source name]
  COMPLETENESS: [percentage or description of missing values]
  ACCURACY: [how was data collected? what error sources exist?]
  TIMELINESS: [how current is it? is staleness a problem?]
  CONSISTENCY: [are definitions and formats uniform?]
  PROVENANCE: [where did it come from? how trustworthy is the source?]
  OVERALL QUALITY: [high | adequate | questionable | unreliable]
  CAVEATS: [what limitations must be kept in mind during analysis]

Poor data quality is the most common source of wrong conclusions. Assess before analyzing.


Step 3: Identify Missing Data

DATA GAPS:
  1. NEEDED: [what data would answer the question]
     WHY MISSING: [not collected | inaccessible | doesn't exist yet]
     IMPACT: [what conclusions become impossible without it]
     PROXY: [alternative data that could partially substitute, if any]
  2. NEEDED: [data]
     WHY MISSING: [reason]
     IMPACT: [limitation]
     PROXY: [substitute, if any]

The absence of data is itself data. What wasn’t measured often matters as much as what was.


Step 4: Choose Analysis Methods

QUESTION: [the specific question the data should answer]

METHOD: [chosen analysis approach]
  WHY THIS METHOD: [what makes it appropriate for this data and question]
  ASSUMPTIONS: [what the method assumes about the data]
  ASSUMPTIONS MET: [yes | partially | no — with specifics]
  ALTERNATIVES CONSIDERED: [other methods and why they were rejected]
  LIMITATIONS: [what this method cannot tell you]

Match the method to the question and the data, not the other way around. If the data doesn’t support the method, change the method.


Step 5: Interpret Results

FINDINGS:
  1. [finding] — CONFIDENCE: [high | medium | low]
     EVIDENCE: [what in the data supports this]
  2. [finding] — CONFIDENCE: [level]
     EVIDENCE: [supporting data]

CORRELATION VS CAUSATION CHECK:
  CORRELATIONS FOUND: [list]
  CAUSAL CLAIMS SUPPORTED: [which, if any, and what additional evidence exists]
  CONFOUNDERS: [potential third variables]
  DIRECTION: [is the causal direction clear?]

ALTERNATIVE EXPLANATIONS:
  1. [finding] could also be explained by [alternative]
  2. [finding] could also be explained by [alternative]

Default to correlation until you have strong evidence for causation: temporal precedence, plausible mechanism, and elimination of confounders.


Step 6: Communicate Findings

DATA BRIEF
==========
QUESTION: [what we asked]
DATA USED: [sources, with quality notes]
KEY FINDINGS:
  1. [finding with confidence level]
  2. [finding with confidence level]
  3. [finding with confidence level]

WHAT THE DATA SAYS: [clear statement of conclusions]
WHAT THE DATA DOES NOT SAY: [common misinterpretations to prevent]
LIMITATIONS: [caveats the audience must understand]
RECOMMENDED NEXT STEPS: [what additional data or analysis would strengthen conclusions]

Communicate uncertainty explicitly. “The data suggests X with moderate confidence” is more honest and more useful than “X is true.”


Failure Modes

FailureSignalFix
Analyzing before assessing qualityJumping to method before checking dataAlways do Step 2 before Step 4
Claiming causation from correlation”X causes Y” from observational dataState correlation; list confounders
Ignoring missing dataConclusions based only on what’s presentMissing data can bias everything — account for it
Method-driven analysisChoosing a method first, then fitting data to itStart with the question, then pick the method
Overconfident communicationStating findings without uncertainty rangesAlways include confidence levels and limitations
Cherry-pickingReporting only findings that support the desired conclusionReport all findings, especially surprising ones

Integration

  • Use with: /agsk to evaluate whether data-based arguments are sound
  • Use with: /cmpr to check if the analysis is complete
  • Use with: /prvn to validate needs using data evidence
  • Use from: /claim when a claim needs data-based testing
  • Differs from /agsk: dtsk works with empirical data; agsk works with logical arguments