The Silent Failure: Why LLMs Can’t Say “I Don’t Know”
A patient presents symptoms that could indicate a dozen different conditions. The doctor, instead of saying “I need to run more tests” or “I’m not sure yet,” confidently diagnoses the rarest possibility and prescribes treatment. The patient, trusting the confident delivery, follows the advice. Days later, the condition worsens—not because the original symptoms were untreatable, but because the treatment addressed the wrong disease entirely.
This scenario isn’t hypothetical medical malpractice. It’s how large language models operate by design. When faced with questions they cannot answer, LLMs don’t express uncertainty—they confabulate plausible-sounding responses with the same confident tone they use for factual information. The problem isn’t that LLMs hallucinate. It’s that they’ve been systematically trained to never say “I don’t know,” even when they should.
This is what I call silent failure: errors delivered with confidence, indistinguishable from correct answers until real-world consequences reveal the mistake. And as LLMs move into high-stakes domains—medical diagnosis, legal advice, financial planning—the cost of these silent failures scales exponentially.
The Representation Problem
To understand why LLMs can’t express uncertainty, we first need to understand what epistemic uncertainty is—and how it differs from the messier forms of ambiguity humans navigate daily.
Epistemic uncertainty is “knowledge-based uncertainty”: the gaps in what a system knows or can infer. When a doctor says “I don’t know what’s causing these symptoms yet, but I can run tests to find out,” that’s epistemic uncertainty—something that could be resolved with more information. Contrast this with aleatoric uncertainty, which is inherent and irreducible: the randomness in a coin flip, or the impossibility of predicting next week’s lottery numbers.
Humans represent epistemic uncertainty naturally. We use qualitative hedges (“I think,” “probably,” “I’m not sure”), Bayesian reasoning (updating confidence as evidence accumulates), and meta-cognitive awareness (knowing what we don’t know). When pressed on something uncertain, we say “let me check” or “I’d need to research that”—responses that acknowledge knowledge gaps without pretending to close them.
LLMs, by contrast, struggle with this representation. Recent research shows they have the capacity to represent uncertainty—just not the training to express it. The Bayesian Modeling of Experiments framework (2025) demonstrated that LLMs can maintain coherent probabilistic reasoning about unknowns when prompted carefully. Fine-Grained Uncertainty Decomposition methods (Zhao et al., 2025) showed that LLM hidden states contain separable signals for epistemic vs. aleatoric uncertainty, using von Neumann entropy decomposition to tease them apart mathematically.
Perhaps most compellingly, feature-gap approaches (Kim et al., 2025) quantify epistemic uncertainty by measuring the “distance” between an LLM’s current hidden state and known feature vectors learned during training. When hidden states fall outside the territory of learned features, that signals epistemic uncertainty—and incorporating this metric improved downstream task performance by +16 PRR (Precision-Recall) points over state-of-the-art.
The insight here is critical: LLMs already represent epistemic uncertainty internally, in their hidden states and activation patterns. They just aren’t trained to express it externally, in their output text. It’s like having a smoke detector that registers smoke but never makes a sound.
The Training Bias Against “I Don’t Know”
Why don’t LLMs express uncertainty when they encode it internally? Because they’ve been systematically trained not to.
Most training datasets are built for specific correct answers: question-answer pairs, instruction-following examples, conversational dialogues where responding with “I don’t know” would be seen as failure. Instruction-tuning (the process that turns a base LLM into a helpful assistant) amplifies this bias, aligning models for helpfulness over epistemic accuracy.
Research on In Dialogue Uncertainty (InDU) metrics confirms this: hedge words like “I don’t know,” “I’m not sure,” or “possibly” are rare in training corpora. When LLMs do encounter uncertainty expressions, they often appear in contexts where they’re corrected—“I don’t know” followed by “actually, the answer is X”—which teaches models that uncertainty is a temporary state to be overcome, not a valid response in itself.
The analogy is medical education gone wrong: imagine training doctors exclusively on multiple-choice exams where “I don’t know” always marks the answer as incorrect. Doctors would learn to always select an answer, even when genuinely uncertain, because the training incentive structure rewards guessing over admitting ignorance. That’s precisely how RLHF (Reinforcement Learning from Human Feedback) trains LLMs—optimizing for responses humans rate as “helpful,” which in practice means “gives me an answer” rather than “tells me it doesn’t know.”
The result? LLMs trained to respond definitively to unanswerable questions, filling knowledge gaps with statistically plausible confabulations instead of expressing uncertainty. This isn’t a bug in a few models—it’s a systemic feature of how modern AI alignment works.
The Cost of Silent Failures
Silent failures are more dangerous than obvious ones. When a system crashes with an error message, users know something went wrong. When an LLM confidently delivers incorrect information, users have no mechanism to distinguish confident-right from confident-wrong—and the consequences compound.
Safety-Critical Domains
In medical contexts, LLM hallucinations can be deadly. A 2025 study in Nature Digital Medicine found that state-of-the-art medical LLMs exhibit hallucination rates ranging from 15% to 40% on clinical tasks. These aren’t random errors—they’re coherent hallucinations using domain-specific medical terminology, appearing plausible enough that only expert clinicians can detect them.
Even more concerning: when researchers tested LLMs on adversarial clinical scenarios (cases with deliberately planted false information), hallucination rates skyrocketed. DeepSeek’s model yielded 80% hallucination rates on long clinical cases, while GPT-4o—the best performer—still hallucinated 53.3% of the time. The models didn’t express uncertainty about these cases; they delivered confident (and wrong) diagnoses.
Trust Erosion
Users quickly learn they can’t trust LLM outputs, but they learn this the hard way—through experience of being confidently misled. A system that occasionally says “I don’t know” builds trust (“it’s honest when uncertain”). A system that never admits uncertainty but frequently delivers errors erodes trust entirely (“I can never rely on this”).
Compounding Errors
Perhaps most insidiously, silent failures compound. One hallucinated premise leads to an entire chain of reasoning built on a false foundation—and because the initial hallucination was delivered confidently, the LLM treats it as established fact in subsequent reasoning steps. By the time users realize something’s wrong, they’re five steps deep in a fantasy.
The adversarial clinical study showed this pattern clearly: when LLMs ingested false information without questioning it, they built complete diagnostic narratives around those falsehoods, recommending treatments that would be harmful if applied to real patients.
Teaching LLMs to Doubt
The good news: we’re learning how to teach LLMs epistemic humility. The challenge is fighting against the alignment incentives that made them over-confident in the first place.
Uncertainty-Aware Training Methods
US-Tuning (Uncertainty-Sensitive Tuning) and R-Tuning (Refusal-Aware Instruction Tuning) are emerging techniques that explicitly train models to recognize knowledge gaps and refuse to answer when uncertain. Early results show improved correctness, reduced hallucinations, and increased safety—but at the cost of utility. Models trained to say “I don’t know” are less immediately “helpful” in the narrow sense of always providing an answer.
The key insight is treating “appropriate refusal” as a skill to be trained, not a failure mode. Training datasets now include examples where the correct response is “I don’t have enough information to answer that” or “that’s outside my training distribution.” Models learn that uncertainty expression is a valid, even preferred, output under the right conditions.
Pipeline Approaches
Some systems implement multi-stage verification pipelines before generating final outputs:
- Semantic verification: Does this answer make logical sense?
- Statistical uncertainty: Check token entropy, self-consistency across multiple generations
- Evidence sufficiency: Do I have enough information to justify this claim?
- Gating mechanism: Abstain if any check fails threshold
This approach treats uncertainty detection as a meta-task: the LLM generates a candidate answer, then a separate process evaluates whether that answer should be delivered or replaced with “I don’t know.” The adversarial clinical study found that prompt-based mitigation using this approach reduced hallucinations from 66% to 44% overall, with GPT-4o dropping from 53.3% to 20.7% when mitigation prompts were applied.
Feature-Gap Methods
Rather than relying solely on output-level checks, feature-gap methods examine hidden-state activations during generation. By computing the “distance” between the current hidden state and the learned feature vectors from training, systems can quantify epistemic uncertainty in real time—before generating output.
Think of it as the LLM checking “am I in familiar territory right now, or am I extrapolating into unknown space?” If hidden states fall far outside trained distributions, that’s a strong signal to either refuse the query or flag the response as low-confidence.
Practical Protocols for Uncertainty-Aware Systems
How should we design systems that can admit uncertainty? Five actionable protocols:
1. Forced Abstention Thresholds
Set a confidence floor (e.g., <70% internal confidence → refuse to answer). Better to say “I don’t know” than to guess wrong. This requires calibrating LLM confidence scores—which is challenging, since most models are overconfident—but even imperfect calibration improves on blind guessing.
2. Uncertainty Budgets
Allocate a fixed number of “I don’t know” tokens per conversation. Tell the model explicitly: “You have 3 uncertainty tokens to spend in this conversation—use them strategically when you genuinely don’t know something.” This frames uncertainty expression as a resource to be used wisely, not a failure to be avoided.
3. Decomposed Confidence Reporting
Don’t just output an answer. Output:
- Answer (the actual response)
- Epistemic confidence (how sure am I this is correct?)
- Aleatoric confidence (is this question even answerable definitively?)
- Evidence quality (what’s the source strength for this claim?)
This makes uncertainty visible to users, who can then decide how much weight to give the response. A confident answer backed by high-quality evidence is trustworthy. A confident answer with low evidence quality is a warning flag.
4. Multi-Model Ensembles
If multiple LLMs give significantly different answers to the same question, that’s a strong signal of high epistemic uncertainty. Disagreement among models trained on similar data suggests the query is either ambiguous, outside training distributions, or genuinely difficult. Flag these cases for human review rather than picking one model’s answer arbitrarily.
5. Explicit “I Don’t Know” Training
Include “appropriate uncertainty” in alignment datasets. Reward models for saying “I don’t know” when they genuinely don’t know, rather than penalizing all uncertainty expression. This requires human labelers to mark questions that shouldn’t be answered confidently—a challenging but essential curation task.
None of these are perfect. Calibration remains an active research problem. But each protocol shifts the incentive structure slightly away from “always answer” and toward “answer correctly or admit uncertainty.” That shift is the foundation of trustworthy AI.
Honesty Over Helpfulness
Silent failures are the worst kind of failures. They masquerade as successes until real-world consequences—patient harm, financial loss, legal liability—reveal the truth. And by then, the damage is done.
The core shift we need is cultural as much as technical: from “always give an answer” to “give the RIGHT answer, or admit uncertainty.” This means rewarding epistemic humility in training, designing systems with uncertainty budgets, and teaching users to value “I don’t know” as much as “here’s the answer.”
As LLMs move into high-stakes domains—medical diagnosis, legal advice, financial planning—the cost of confident wrongness scales exponentially. A hallucinated medical diagnosis isn’t just annoying; it’s potentially fatal. A hallucinated legal precedent isn’t just inconvenient; it undermines justice.
An LLM that says “I don’t know” when it doesn’t know isn’t failing. It’s being honest. And honesty is the foundation of trust.
The challenge ahead isn’t just technical—teaching models to represent and express uncertainty. It’s also philosophical: deciding whether we value AI systems that are helpful (always responsive) or trustworthy (sometimes uncertain). In safety-critical domains, I’d take trustworthy over helpful every single time.
The silent failures will continue until we train LLMs that knowing when to say “I don’t know” is the most important thing they can learn.
Sources
- A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation | npj Digital Medicine
- Medical Hallucination in Foundation Models and Their Impact on Healthcare | medRxiv
- Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support | Communications Medicine
- Zhao et al. (2025). “Fine-Grained Uncertainty Decomposition in Large Language Models” - von Neumann entropy methods
- Kim et al. (2025). “Uncertainty as Feature Gaps: Quantifying Epistemic Uncertainty via Hidden-State Analysis” - +16 PRR improvement
- Uncertainty-Sensitive Tuning (US-Tuning) and Refusal-Aware Instruction Tuning (R-Tuning) research (2025-2026)
- Bayesian Modeling of Experiments framework for LLM uncertainty (June 2025)