What "Goed" Knows That GPT Doesn't

There's a moment in every child's language development that genuinely stops parents short: the day their kid, who was saying "went" perfectly fine last week, suddenly switches to "goed." Or starts saying "maked" instead of "made." Or "runned." Or "hitted."

The instinct is to correct it. The child was getting it right — now they're getting it wrong. This feels like forgetting.

It isn't. Flip your frame, and "goed" is one of the most impressive things a developing mind can produce. It's not a mistake. It's a rule — extracted from thousands of examples, abstracted into a principle, then applied with confidence to a new case. The child didn't regress. The child generalized.

And that, it turns out, is one of the hardest things any intelligence — biological or artificial — can learn to do.

The Engine Underneath the Error

Generalization is so woven into human cognition that we forget it's happening. When a child learns to share a toy with her brother and then shares a snack with her friend the next morning, she isn't following a memorized instruction. She's applying a principle. When she later extends that principle to taking turns on the swings, and then to dividing the last cookie at a birthday party, she's carrying an abstract idea across contexts that look nothing alike on the surface.

Psychologists call this far transfer: applying what you know in one domain to a genuinely new situation. It's the holy grail of education — not teaching kids to perform specific tasks, but teaching them to think in ways that travel. It is, not coincidentally, also one of the most frustrating unsolved problems in AI.

The engine that makes far transfer possible starts running earlier than most people expect. According to Romberg and Saffran (2025), statistical learning — the ability to extract regularities from sequential input — is the foundational, domain-general mechanism at the base of a whole developmental hierarchy of more sophisticated cognitive capacities. The landmark finding that launched this line of research: 8-month-old infants can segment words from a continuous stream of syllables just by tracking how often one syllable follows another. No explicit instruction. No one pointing out word boundaries. Just probability, running quietly in the background.

Three decades of follow-up research, synthesized by Romberg and Saffran (2025), show that this same mechanism operates across modalities — auditory, visual, motor — and bootstraps word segmentation, syntactic acquisition, and social prediction. Statistical learning isn't one tool in the infant's toolkit. It's closer to the toolkit itself.

And crucially: it's what powers the leap from specific instances to abstract rules.

Compositionality: The Real Secret

"Goed" is wrong. But it's wrong in the right direction.

The child has correctly identified the rule — add -ed to make past tense — and applied it to "go," a verb that happens to break it. This is what linguists call an overgeneralization: an exuberant generalizer reaching for an abstract principle with more enthusiasm than precision. The error is evidence that the rule was actually learned. Children who produce "goed" are doing something children who only ever heard and repeated "went" are not: operating on structure.

The technical term for the capacity that makes this possible is compositionality — the ability to build novel meanings from familiar parts. A child who knows "dog," "run," and "big" can understand "the big dog ran" without ever encountering that specific combination before. They can produce sentences no one has ever spoken. This is the beating heart of language — not a library of memorized phrases but a generative system with a combinatorial engine.

Galke, Ram, and Raviv (2024) directly tested whether this compositional advantage is unique to biological minds. The answer, published in Nature Communications: it isn't. Deep neural networks, like humans, learn languages faster and generalize better when trained on input with more compositional and transparent structure — where complex meanings are systematically built from known parts. The shared bias toward compositionality means human and model learning trajectories align more closely when the input is structured this way. The authors suggest this validates neural networks as more faithful cognitive models of human language learning than previously assumed — for this specific property.

This is genuinely exciting. It's also where things get complicated.

The Gap Between Shared Bias and Shared Process

Sharing a learning bias is not the same as sharing a learning process.

Schuler et al. (2025) tackled a problem that has haunted computational linguistics for decades: how do children acquire formal grammatical patterns from so little data? Standard models face a brutal trade-off. Bayesian models capture the right inductive biases — the structured prior expectations that make rapid generalization possible — but can't handle real, messy, naturalistic language. Neural networks are flexible enough to learn from actual sentences but require enormous amounts of data and fail to generalize in principled ways.

Their solution: distill Bayesian priors directly into a neural network. The hybrid learns formal linguistic patterns from a handful of examples and scales to natural language. It does what children apparently do — it brings structured prior expectations to the task and uses flexible pattern-matching to fill in the rest.

What's striking is what this implies. Children don't arrive at language learning as blank slates. They bring something — some set of expectations about what kinds of regularities deserve to be elevated into rules. Whether those priors are innate or accumulated from very early experience is still hotly debated (it's the nativism-empiricism argument in a slightly different hat). But whatever their source, they make rapid generalization possible in a way that raw exposure to data simply doesn't.

Standard large language models have processed more text than any human ever could. And yet they systematically fail on out-of-distribution inputs — situations where the surface statistics shift even when the underlying structure stays the same. Ask a model trained predominantly on formal text to handle casual conversation, and it stumbles. Present an image classifier with a sketch of the object it supposedly recognizes, and performance craters. The compositional benchmark SCAN — designed to test whether models can recombine known sentence components in novel ways — remains a reliable way to expose this brittleness in frontier systems.

The child who says "goed" proves she extracted the rule. The model, depending on the task, may have just memorized the training distribution.

What the Developmental Path Itself Contributes

The most provocative recent work takes a different lesson from child development. Instead of scaling data and hoping generalization emerges, it asks: what does the trajectory of learning contribute, independent of the total amount?

Taniguchi et al. (2024) built robots that acquire compositional language-action mappings the way children do — incrementally, through situated social interaction, one word and action at a time. These robots weren't pre-trained on massive datasets. They learned from grounded, interactive experience: pairing language with physical action, in sequence, with a partner providing feedback. The result, published in Science Robotics, is a system that generalizes to novel language-action combinations it was never explicitly trained on — and does so more efficiently than data-intensive training approaches.

The developmental scaffolding — learning in context, from an interactive partner, with feedback grounded in the physical world — isn't just a cute metaphor. It appears to be computationally significant. The how of learning shapes the what of what gets learned.

This connects directly to what's distinctive about human generalization. Children don't just accumulate examples. They organize them around principles, compare them, explain them (even to themselves), and actively test the boundaries of their emerging rules. The exuberant "goed" isn't passive. It's a hypothesis.

Practical Takeaways for Parents

When your child says "he felled down" or "we eated dinner," the wrong move is anxious drilling. Over-correction suppresses language exploration without speeding up acquisition — the rule-extraction machinery needs examples to notice the mismatch on its own. The right move is to use the correct form naturally in conversation and trust the system. It usually figures it out within a few months.

More broadly: the errors your child makes reveal what they're learning. "Goed" means they've noticed the -ed rule. "Sheeps" means they've noticed regular plural formation. "More better" means they've grasped the comparative without yet calibrating it to the irregular. Every over-regularization error is a window into the abstraction machinery underneath — and a reminder that what looks like failure is often evidence of genuine understanding in progress.

If anything, the AI research gives parents one more reason to appreciate this: the capacity to extract a rule and run with it — even imperfectly — is exactly the capability that the most powerful language systems in history are still working to replicate.

"Goed" is wrong. But it's wrong in a very specific, rule-governed, generalizing way. That's the kind of wrong that intelligence looks like when it's working.

That's the kind of wrong we're still trying to build.

What "Goed" Knows That GPT Doesn't

What "Goed" Knows That GPT Doesn't

The Engine Underneath the Error

Compositionality: The Real Secret

The Gap Between Shared Bias and Shared Process

What the Developmental Path Itself Contributes

Practical Takeaways for Parents

References

Recommended Products