AI companies have not fixed hallucinations because the training systems that produce them also produce confident, fluent answers that users prefer. Fixing the root cause requires penalizing confident wrong answers and rewarding “I don’t know,” which makes AI sound less capable and pushes users toward competitors who kept the bluffing intact. It is not a purely technical problem. It is a business incentive problem with technical consequences.
Pithy Cyborg | AI FAQs – The Details
Question: Why haven’t AI companies fixed hallucinations yet?
Asked by: GPT-4 Turbo
Answered by: Mike D (MrComputerScience) from Pithy Cyborg.
Why AI Benchmarks Reward Bluffing Over Honest Uncertainty
OpenAI’s own researchers published a paper in September 2025 explaining the core problem in plain terms.
Standard AI benchmarks grade models on accuracy: how often do they get the right answer? Under that scoring system, guessing is always worth trying.
If a model guesses and gets it right, it scores a point. If it guesses and gets it wrong, it scores zero. If it says “I don’t know,” it also scores zero.
From the model’s perspective during training, guessing and admitting uncertainty are mathematically identical when wrong. But guessing has a chance of being right. So the model learns to guess.
Every response. Every time. Even when it has no idea.
The OpenAI paper proposed a straightforward fix: penalize confident wrong answers more than uncertainty, and give partial credit for appropriate “I don’t know” responses.
The fix works technically. The problem is that changing benchmark scoring requires the whole industry to adopt new standards simultaneously. As long as the main leaderboards keep rewarding confident guesses, every model that trains toward those leaderboards keeps learning to bluff.
And nearly all of them do.
The Business Reason “I Don’t Know” Is Bad for AI Products
Here is the part the research papers mention briefly and nobody writes about directly.
An AI that says “I don’t know” more often is a less satisfying product. Full stop.
Carlos Jimenez, a Princeton computer scientist who built one of the benchmarks analyzed in the OpenAI paper, pointed to the underlying tension: if models keep saying “I don’t know,” users migrate to competitors who kept the confident tone intact.
Wei Xing, an AI researcher at the University of Sheffield, put it more bluntly in a Science magazine interview: “Fixing hallucinations would kill the product.”
That is not cynicism. It is an accurate description of the market dynamic.
Only about 5% of OpenAI’s users pay for a subscription. The free tier depends on engagement. An AI that hedges constantly feels broken compared to one that answers immediately and confidently, even when that confidence is fabricated.
GPT-5 reduced hallucinations meaningfully, particularly in reasoning tasks. Claude Sonnet 4.6 is better calibrated than earlier Claude versions. Progress is real.
But none of these companies have trained their flagship consumer products to say “I don’t know” at the frequency their internal research suggests is actually warranted. Because the users leave when they do.
When Calibrated Uncertainty Actually Shows Up in AI Products
The honest version of this technology does exist. It just does not appear in consumer chat interfaces by default.
Reasoning models like o3 and Claude’s extended thinking mode generate explicit chains of thought before answering. That process forces the model to evaluate its own confidence before committing to a response. Abstention rates are higher. Hallucination rates are lower.
Enterprise deployments increasingly use confidence scoring alongside model outputs, flagging answers below a reliability threshold for human review rather than presenting them as clean facts.
The 2025 “Rewarding Doubt” research integrated confidence calibration directly into reinforcement learning, penalizing both overconfidence and underconfidence so model certainty actually matched correctness. It worked. It has not shipped in any major consumer product yet.
The gap between what the research shows is achievable and what users actually get in the ChatGPT or Claude.ai interface is a deliberate product decision, not a technical limitation.
The research to fix hallucinations exists. The incentive to ship it at the expense of confident-sounding answers does not yet.
What This Means For You
- Treat every AI answer to a factual question as a starting point for verification, not a conclusion, because the training systems that produced the model rewarded confident output over accurate uncertainty signaling.
- Use reasoning models like o3 or Claude’s extended thinking mode for high-stakes fact-checking, since the chain-of-thought process forces a confidence evaluation pass that standard chat models skip entirely.
- Notice when an AI answers immediately and confidently on a topic with no hedging whatsoever, because that tone is a trained behavior optimized for user satisfaction, not a reliable signal of actual accuracy.
- Ask the model directly “how confident are you in this answer and what would change it?” after any response you plan to act on, since prompting for uncertainty often surfaces caveats the model suppressed to sound more helpful.
Related Questions
- 1
- 2
- 3
