Structured Analytic Techniques Meet LLMs: Automating ACH and Red Team Analysis

Every experienced analyst knows the enemy isn't always the adversary. Often it's confirmation bias, anchoring, and the institutional pressure to produce an assessment that fits what senior leadership already believes. Structured Analytic Techniques (SATs) exist precisely to fight those tendencies. ACH, red teaming, key assumptions checks: these methods work when analysts actually use them. That last part is the problem.

Scrabble tiles spelling 'Analytics' on a wooden surface, symbolizing data analytics concept. Photo by Markus Winkler on Pexels.

ACH (Analysis of Competing Hypotheses), developed by CIA psychologist Richards Heuer, asks analysts to build a matrix of competing explanations and score each piece of evidence against each hypothesis. Done properly, it surfaces the hypotheses that are least inconsistent with the available evidence rather than the ones that feel most compelling. It's methodologically sound. It's also tedious, which is why it gets skipped under deadline pressure or collapsed into a cursory two-hypothesis exercise that confirms whatever the lead analyst already thought.

LLMs change the production cost of this process significantly. Not the rigor, but the friction.

What Automating ACH Actually Looks Like

Here's a concrete workflow. An analyst provides a collection of intelligence reports, either raw or summarized. A structured prompt chain asks the model to:

Enumerate plausible hypotheses (prompted to generate at least five, including low-probability alternatives)
Extract discrete evidence items from the source material
Score each evidence item against each hypothesis using a defined scale (consistent, inconsistent, neutral, not applicable)
Flag which hypotheses survive diagnostic evidence cuts
Identify evidence gaps where no available reporting bears on a key hypothesis

graph TD
    A[/Intelligence Reports/] --> B(Hypothesis Generation)
    B --> C(Evidence Extraction)
    C --> D{ACH Matrix Scoring}
    D --> E[Surviving Hypotheses]
    D --> F[Evidence Gap Report]
    E --> G((Analyst Review))
    F --> G

Step 4 is where the value concentrates. Human analysts running ACH mentally tend to weight hypothesis plausibility when scoring evidence. The model, given a properly constrained prompt, scores each pair independently without caring which hypothesis "feels right." That's not the same as objectivity, but it does produce a first-pass matrix an analyst can interrogate rather than construct from scratch.

The evidence gap report in step 5 is arguably more valuable than the matrix itself. Knowing which hypotheses have no supporting or contradicting evidence in current collection drives better RFI generation. That's a concrete improvement in the analytic workflow.

Red Team Automation: Where It Gets Uncomfortable

Red teaming asks a designated group to argue the adversary's case, challenge the prevailing assessment, or identify assumptions the main analytic line hasn't questioned. Institutional red teaming is chronically underfunded and rarely happens on short-notice products.

An LLM-assisted red team runs differently. Feed the model a finished assessment and prompt it to act as an adversarial reviewer with specific instructions: identify the three strongest assumptions underlying the conclusion, generate alternative explanations the assessment doesn't consider, and produce the most damaging critique of the methodology.

This works better than it should. Models trained on large corpora of argumentation and analysis are reasonably good at generating steelman counterarguments. The output isn't a substitute for a human red team with domain expertise, but it catches the obvious structural weaknesses in an assessment that the authoring analyst missed because they were too close to it.

Two caveats matter here. First, the model will sometimes produce critiques that are superficially plausible but technically wrong about domain specifics. Analyst review isn't optional. Second, prompt design heavily influences what kind of red team you get. A prompt that asks "is there anything wrong with this assessment" will produce sycophantic hedging. A prompt that instructs the model to assume the assessment is wrong and identify why produces something worth reading.

The Bias Reduction Question

Does this actually reduce cognitive bias in finished intelligence? Honest answer: probably partially, in specific ways, with real caveats.

Automating the mechanical parts of ACH reduces anchoring bias by separating hypothesis generation from evidence scoring. It reduces availability bias by forcing enumeration of alternatives the analyst might not spontaneously consider. It does nothing about collection bias (the model can only score what's in the source material) and it introduces its own failure modes around hallucinated evidence and spurious consistency scores.

The right framing: LLM-assisted SATs are better than no SATs, which is the realistic baseline for time-pressured analytic production. They don't replace tradecraft. They make tradecraft more likely to happen.

For teams looking to implement this, start with ACH on finished products as a validation step rather than trying to rebuild the entire analytic workflow. Run the model against an assessment that already exists, compare its matrix to the implicit reasoning in the text, and see where the gaps are. That's a low-risk entry point with immediate diagnostic value.

The goal isn't automation for its own sake. It's making the methods that already exist actually get used.

Structured Analytic Techniques Meet LLMs: Automating ACH and Red Team Analysis

What Automating ACH Actually Looks Like

Red Team Automation: Where It Gets Uncomfortable

The Bias Reduction Question

Related Reading

Disinformation Source Attribution with ML: Tracing Narratives Back to Their Origin

Cross-Lingual Intelligence Analysis: Using Multilingual LLMs to Close the Translation Gap in OSINT Pipelines

Zero-Shot Classification for Intelligence Triage: Getting Useful Signal Without Labeled Training Data