Adaptive Extraction Pipeline

Overview

Breadth-first, learned extraction pipeline that clarifies goals first, samples broadly, learns user preferences, and extracts selectively from highest-value items.

Steps

Step 1: Clarify extraction goals (Phase 0)

Use question_generation to clarify extraction priorities:

WEIGHTING QUESTION: “When evaluating for extraction, which matters MORE?”

Procedure density (more ‘how to’ per minute)
Uniqueness (rare insights, can’t find elsewhere)
Direct relevance (applicable to current goals)
Roughly equal (balanced)

PROCEDURE TYPES QUESTION: “What types of procedures are you MOST interested in?”

Technical methods (how to build/code/engineer)
Thinking patterns (how experts reason/decide)
Research methods (how to find/validate/synthesize)
All types equally

PRIORITY DOMAINS QUESTION: “What are your CURRENT priority domains?”

AI/ML systems, Business/income, Research/learning
Health/o, Engineering/hardware

SKIP THRESHOLD QUESTION: “At what point would you SKIP a video entirely?”

< 1 procedure expected
Low uniqueness (could find elsewhere)
Off-topic for current goals

Duration: 10-15 minutes

Step 2: Get broad sample (Phase 1)

Sample broadly across source collection:

Get video/content lists from sources
- ~10 recent items from each Tier 1 source
- Target: ~100 items total for calibration
Fetch transcript/content samples
- First ~2000 characters per item
- Enough to assess without full processing
Ensure diversity:
- Mix of sources
- Mix of content types
- Mix of topics

Step 3: Run initial triage

Score each sampled item on extraction profile criteria:

DENSITY (1-5): How much is ‘how to’ vs ‘what is’? UNIQUENESS (1-5): Could find this knowledge elsewhere? RELEVANCE (1-5): Matches priority domains? EXPECTED_PROCEDURES (0-10): Estimated extractable procedures

Calculate composite score: composite = (density * w_d + uniqueness * w_u + relevance * w_r) * (expected_procedures / 5)

where weights come from extraction_profile.weighting:

balanced: w_d=0.33, w_u=0.33, w_r=0.33
density_first: w_d=0.5, w_u=0.25, w_r=0.25
uniqueness_first: w_d=0.25, w_u=0.5, w_r=0.25
relevance_first: w_d=0.25, w_u=0.25, w_r=0.5

Step 4: Calibrate with user ratings

User manually rates ~20 items to calibrate model:

Present items in random order
User rates each 1-10 for extraction value
Duration: ~1 hour (3 min per item)

Purpose: Learn where model predictions diverge from user preferences

Calibration output:

Items where model overestimated (user rated lower)
Items where model underestimated (user rated higher)
Patterns in divergence

Step 5: Validate triage model

Compare model predictions to user ratings:

Calculate correlation:

Pearson correlation between predicted and actual scores
Target: correlation > 0.7 means model is calibrated

If correlation < 0.7:

Identify systematic biases
Ask clarifying questions about divergences
Adjust weights or criteria
Re-run triage on calibration set

If correlation >= 0.7:

Model is calibrated, proceed to Phase 2

Step 6: Scale triage to all sources (Phase 2)

Apply calibrated model to full source collection:

Expand to all sources:
- Get content lists from ALL channels (Tier 1-3)
- Target: full coverage (~5000 items typical)
Batch triage:
- Fetch transcript samples for all items
- Run calibrated scoring model
- Use LLM for initial scoring (Haiku for efficiency)
Global ranking:
- Sort ALL items by composite score
- Single ranked list across all sources

Step 7: Identify extraction targets

Mark items for extraction based on budget:

Heuristic: Top 100 items often = 50%+ of total value

Budget allocation:

Calculate items affordable within budget
Consider: top 10% often yields 50% of value
Mark clear “extract” vs “skip” vs “maybe”

Create extraction queue:

Ordered by score (highest first)
Include estimated extraction time
Include skip flags for items below threshold

Step 8: Execute selective extraction (Phase 3)

Full extraction on highest-value items only:

Run 3-pass extraction on queued items:
- Pass 1 (Explicit): What they SAY (HIGH confidence)
- Pass 2 (Implicit): What they DO (MEDIUM confidence)
- Pass 3 (Meta): HOW they think (MEDIUM confidence)
Track actual yield:
- Record procedures extracted vs predicted
- Note where predictions were accurate/inaccurate
Monitor for diminishing returns:
- Track procedures per item in rolling window
- If rate drops below 2 procedures/item, reassess

Step 9: Update model for future runs

Apply evolutionary improvement to triage model:

Analyze prediction accuracy:
- Which criteria correlated with actual yield?
- Which criteria were noise?
Evolve successful criteria:
- Keep/strengthen criteria that predicted well
- Modify criteria that were partially predictive
- Drop criteria that didn’t correlate
Document learnings:
- What types of content yielded best?
- What skip criteria were accurate?
- Recommendations for next extraction run

When to Use

When starting extraction from a new user’s sources
When user priorities are unclear or unstated
When processing large source collections (100+ videos)
When extraction budget requires high precision targeting
When previous extraction didn’t match user expectations
After changing goals or priorities
When optimizing extraction ROI is critical

Verification

User preferences captured through explicit questions
Model calibrated with user ratings (correlation > 0.7)
Global ranking covers all sources
Extraction focused on highest-value items
Actual yield tracked against predictions
Model improved based on outcomes

adep - Adaptive Extraction Pipeline