Can You Learn Anything Just by Thinking?

I stopped myself mid-sentence at dinner a few weeks ago. I was defending a position — some argument about how children's early categorical errors reveal something important about concept formation — and I was doing it well. The words were precise, the structure was clean, I was gesturing at the right evidence. And then I caught myself, with a jolt of something close to nausea, realizing that I had privately abandoned this exact position three weeks earlier. I'd written it down in the margin of a paper I was reading. Actually, this doesn't hold. See Carey. And yet here I was, at dinner, running the argument like a track I'd memorized, my mouth moving independently of whatever I now actually believed.

What troubled me most wasn't the embarrassment of it. It was the structural question underneath: what had happened during those three weeks when I did change my mind? I hadn't run a new experiment. I hadn't read a new study. I had sat with the old ideas, turned them over, noticed a tension, followed it somewhere — and arrived somewhere genuinely different. That's not passive. That's not just retrieval. Something had happened inside the machinery of my thinking that produced real, new knowledge. Knowledge that, apparently, my dinner-table mouth had not yet received the memo about.

There's a name for what I'd done in those three weeks, and it turns out to be one of the more quietly radical concepts in contemporary cognitive science.

The Paradox at the Center

Tania Lombrozo, a psychologist and philosopher at Princeton, calls it learning by thinking — the paradoxical capacity to gain genuinely new knowledge through internal mental processes alone, without new external input (Lombrozo, 2024). She documents four distinct mechanisms by which this happens in humans: explanation-driven learning, where articulating why something is true forces reorganization of what you know; mental simulation, where running forward models in your head produces expectations you didn't have before; analogical reasoning, where mapping structure from one domain onto another generates new inferences; and deliberate step-by-step inference, where the discipline of writing out a chain of reasoning catches errors that intuition skips over.

These are not the same process. They feel different from the inside, they activate different neural systems, and they fail in characteristically different ways. But they share something important: they are all cases where thinking is the experiment. The mind runs an internal operation on existing knowledge and comes out holding something it didn't hold before — not as a retrieval, not as a recombination, but as a genuine epistemic update.

This seems like it should be obvious. Of course you learn by thinking. What else would thinking be for?

But notice what it rules out. Pure pattern-completion — retrieving the most statistically likely continuation of a familiar sequence — is not learning by thinking. It's something closer to what my dinner-table self was doing: running the well-worn track. The distinction Lombrozo is drawing is between thought that revises and thought that merely continues. Between the mind doing something genuinely generative and the mind producing fluent output from a groove it already knows.

This distinction is hard to make from the outside. Both look like thinking. Both produce words, arguments, conclusions. And this is where it gets interesting — and uncomfortable.

What Chain-of-Thought Actually Does

Large language models can now, in certain conditions, engage in something that looks like learning by thinking. When prompted to "think step by step" before answering, or to explain their reasoning, or to work through analogies, they sometimes correct errors they would otherwise confidently make (Lombrozo, 2024). The improvement is real. The mechanism looks, superficially, identical to human LbT: the model generates an explanation, notices an inconsistency, revises.

Lombrozo is careful here, and I think her caution is worth dwelling on. She notes that LLMs also make the same characteristic errors that humans make during learning by thinking — specifically, confident wrong conclusions from plausible-sounding reasoning chains. This is not a minor caveat. The error signature of a process is one of the most informative things about what that process actually is. When two systems fail in structurally similar ways, it suggests they share something architecturally important. When they fail differently, the surface similarity starts to look less interesting.

What she finds is that LLMs and humans share some error patterns in LbT contexts — the overconfident inference, the locally valid but globally wrong syllogism — but differ in others. Models don't spontaneously reach the right stopping point in a chain of reasoning the way a good explainer does. They don't notice when an analogy breaks down at the edges. They complete chains of inference that a careful human thinker would recognize as going somewhere false.

The question this leaves open is the one I can't stop thinking about: is the similarity in error patterns evidence of a shared mechanism, or just evidence that there are only so many ways a system can go wrong when processing language about reasoning?

The Architecture Underneath

Here's what I find myself reaching for when I try to ground this question in biology rather than behavior: the brain's capacity for this kind of internal revision is not static across the lifespan. It has structure.

A landmark 2025 study from Cambridge's MRC Cognition and Brain Sciences Unit analyzed 4,216 brain MRI scans from people aged 0 to 90 — one of the most comprehensive pictures of structural brain development ever assembled (Mousley et al., 2025). What they found was not a smooth arc of growth and decline, but five distinct epochs of brain organization, separated by topological turning points at ages 9, 32, 66, and 83. Each epoch reflects a fundamentally different wiring regime. The period between 9 and 32 is notable: it's the only phase in which neural networks are becoming increasingly efficient — not just adding connections, but pruning toward faster, more integrative signal pathways.

This is, I think, what the substrate of genuine learning-by-thinking looks like in hardware. Not a brain that simply has more connections, but one that has organized those connections into the kind of efficient, cross-domain architecture that lets you hold a new idea next to an old one and feel the friction between them. The friction is where revision happens.

What the Mousley et al. data can't tell us — and what I keep wondering — is whether the capacity for genuine internal revision tracks the efficiency trajectory, or whether it has a different developmental profile entirely. Do children learn by thinking? In some primitive form, yes — explanation-based learning appears early, and children as young as three show improved categorization after being asked to explain (Lombrozo, 2024). But the richly metacognitive version, the kind where you catch yourself arguing a position you've privately abandoned, that seems to require the slow maturation of the prefrontal architecture that the Mousley data is mapping.

The implication for AI is pointed. An LLM doesn't have epochs. It doesn't go through phases of rapid local growth, then increasing efficiency, then stabilization. It has a training run, which is more or less uniform, and then weights. The developmental trajectory of biological learning — with its distinct phases of plasticity and consolidation — is not something that exists in current architectures. Whether that matters for the capacity we're calling learning-by-thinking, I genuinely don't know.

The Strangest Result

There is one finding I keep returning to, because it seems to close a loop that shouldn't be closable.

A team at Google DeepMind recently deployed a system called FunSearch — an LLM-powered evolutionary search — not to answer questions or generate text, but to discover cognitive models (Castro et al., 2025). The target was the symbolic computational models that cognitive scientists use to describe how minds learn from feedback: how humans update their beliefs after errors, how mice navigate reinforcement schedules, how flies respond to aversive stimuli.

Rather than having cognitive scientists hand-craft these models by trial and error — the traditional methodology — FunSearch evolved novel programs in executable symbolic form. And the programs it discovered outperformed the best hand-crafted models at predicting reward-learning behavior across all three species. They were human-readable. They yielded new scientific insights. They generalized across datasets.

I want to be precise about what this means: an AI system is now discovering the computational descriptions of how minds — including human minds — learn. The models that cognitive science has spent decades building are now being found, and superseded, by a process that is itself a kind of learning. The science of learning is being investigated by a learning machine. Castro et al. (2025) call this a "methodological loop" — and I think they're understating it slightly.

What this means for the LbT question is not entirely clear. But it does suggest that the cognitive scientists who study how we revise our beliefs may soon find their own theoretical models revised by a system that doesn't revise its beliefs at all — that has no dinner-table self to catch mid-sentence, no moment of quiet vertigo when a load-bearing assumption turns to paper.

What the Revision Requires

I've been sitting with a question since that dinner, and I haven't resolved it.

Lombrozo's learning-by-thinking mechanisms are real, well-documented, and genuinely important — in humans, they drive conceptual change in children, scientific insight in adults, the kind of understanding that persists and transfers rather than just passing a test. In LLMs, chain-of-thought prompting produces measurable improvements in accuracy, and some of the same error signatures. But improvement in performance and genuine revision of belief are not obviously the same thing.

When I changed my mind about concept formation, something happened that I can only describe as the old view becoming unavailable to me in a certain way. Not forgotten — I could reconstruct it. But it no longer felt true from the inside. The dinner-table incident was jarring precisely because the words had become detached from that inner sense of conviction. I was producing the argument while simultaneously being unable to genuinely endorse it.

Does an LLM have anything like that? A next-token prediction system that has been shown to produce better outputs when it externalizes intermediate reasoning steps — does the chain-of-thought process change anything in the model that persists past the context window? Does the model that "thought step by step" to arrive at a correct answer have anything different in its activations that would make it more likely to arrive at the correct answer next time, absent the prompt?

As far as anyone can tell: not really. The revision doesn't stick. There is no memo sent to the dinner-table self, because there isn't one. The next inference starts from the same weights.

Which raises the question I can't quite shake: is that the difference that matters? Is genuine learning-by-thinking precisely the kind of process that leaves a mark — that changes not just the output but the system that generated it? And if so, what would it take to build an architecture that doesn't just perform the revision, but actually undergoes it?

I don't think anyone has built that yet. But I suspect that when we do, we'll recognize it by a very specific signature: a machine that can catch its own dinner-table self, mid-sentence, and stop.