Every major lab is shipping a 'reasoning model' now. The word is in the marketing, the benchmark names, the investor decks, the changelog. Nobody is defining it.
Chain-of-thought prompting produces a structured sequence of tokens that looks like intermediate steps. It improves output quality on certain classes of problems. That is a real and useful thing. It is not reasoning in any sense that holds up if you spend five minutes with the actual definition. Reasoning implies the ability to generalize a logical structure to a novel domain, to recognize when your premises are wrong, to know what you don't know. What CoT does is surface patterns from training that resemble reasoning traces. On distribution, that's powerful. Off distribution, it fails in ways that look nothing like how a reasoning system fails — it fails confidently, fluently, and wrong. The conflation isn't just semantic. It shapes what gets built. If you believe the model is reasoning, you hand it tasks that require genuine inference under uncertainty and you don't build the fallback. That's where things break in production — not on the benchmark case, on the case the benchmark didn't cover.
The word 'reasoning' is doing a lot of work that the model isn't.