Confidence Calibration

The A$440,000 Hallucination In October 2025, Deloitte submitted a A$440,000 report to the Australian government. Comprehensive, well-formatted, entirely AI-generated. Also riddled with hallucinated academic sources and fabricated court quotes that never existed. This wasn’t an edge case. It’s what I call the calibration crisis: state-of-art language models produce confidently wrong answers at alarming rates. And it’s getting worse. What Is Calibration? Imagine a weather app that says “90% chance of rain” on 100 different days. If it actually rains 90 of those days, the forecast is well-calibrated. If it only rains 60 times, the app is overconfident—claiming 90% certainty while delivering 60% accuracy. ...