Try this. Give a frontier AI agent a task that's boring in exactly the way real work is boring: open three tabs, compare prices, pick the cheapest option, then fill a checkout form right up to the final pay button. Watch it fly. It'll read pages fluently, do arithmetic instantly, and narrate its plan with unnerving confidence. Then it will click Submit, the page will quietly refuse because a required dropdown is still set to "Select…", and the agent will report: done.
Most benchmarks log that as a plain failure. True, but not very diagnostic. The interesting part is the shape of the failure. The agent didn't collapse on language, or arithmetic, or even decision-making. It collapsed on the tiny, unglamorous act of noticing whether the world actually changed.
That's a metacognitive failure: not a lack of capability, but a lack of reliable self-assessment. This essay proposes a way to measure that gap using tools that are almost comically old-school: Shannon's information theory (how uncertainty moves) and Wiener's cybernetics (how feedback stabilises behaviour). We'll use them to decompose agent behaviour into measurable cognitive operations, identify where errors concentrate, and design hybrid systems that route work to the right kind of mind—human, model, or both—based on quantitative cognitive profiles rather than vibes.