Born Pre-loaded

The first time Renée Baillargeon showed a baby a magic trick, the baby wasn't impressed.

It was 1985. Baillargeon was running a violation-of-expectation experiment that would quietly dismantle one of the foundational assumptions in developmental psychology. A three-and-a-half-month-old infant watched a screen lower in front of a box — watched it pass through where the box should have been, physically could have been — and stared. Not in delight, the way babies stare at something novel and wonderful. In something more like affront.

The baby was tracking an object it could no longer see. It expected that object to still be there, solid and impermissible, blocking the screen's path. Piaget had placed object permanence at eight months. Baillargeon found it at three and a half. The infant, months before it could sit up unassisted, already understood something about the physical world that no one had explicitly taught it.

This is the puzzle at the heart of developmental cognitive science — and, increasingly, at the heart of AI architecture: where does the starting point come from?

The Debate That Refuses to Die

The nativism-empiricism dispute is one of those philosophical arguments that should have been resolved by now and somehow keeps reopening. On one side, the empiricists: the mind begins as a tabula rasa, a blank slate, and knowledge is written on it through experience alone. On the other, the nativists: the mind arrives pre-equipped with structures, biases, and partial knowledge that make learning possible in the first place.

Chomsky made the nativist argument most forcefully in linguistics. Children acquire grammatically complex language by age three or four, across wildly different languages and cultures, from what looks like impoverished input. Children never hear certain grammatical constructions, yet they produce them correctly. They make systematic errors — "goed," "maked," "foots" — that reveal rule-following, not mimicry. Chomsky's "poverty of the stimulus" argument holds that the input children receive is too sparse and degraded to explain their linguistic output, unless the mind arrives with substantial prior structure. Language, he proposed, comes with a universal grammar we are born already knowing.

Elizabeth Spelke extended the argument beyond language. Working with infants who couldn't yet produce words, let alone grammatical sentences, she mapped what she calls core knowledge systems: dedicated neural machinery for tracking objects, numbers, agents, and geometric relationships. These systems activate within weeks of birth. A newborn looks longer at a scene that violates basic physics than one that obeys it. At five months, an infant watching two dolls hidden behind a screen and then seeing only one doll revealed looks puzzled — not because anyone taught it arithmetic, but because it already knows something about small sets of objects adding and subtracting.

Spelke's core knowledge hypothesis holds that these systems are not learned from experience; they are the precondition for learning from experience. They are what you bring to the world so the world can teach you anything at all.

The empiricist counterattack came from an unexpected direction. In 1986, Rumelhart and McClelland published a connectionist model of English past tense acquisition — including irregular verbs — built from pure pattern-recognition over training examples. No explicit rules. No special-purpose grammar module. Just a neural network learning the statistics of its input. The model produced the same U-shaped developmental curve as children: first getting irregular verbs right, then over-regularizing ("goed"), then sorting out exceptions. If you could get that curve from associative learning alone, what exactly did you need a language module for?

The debate calcified into an empirical and theoretical war that lasted decades and has never fully resolved.

The Bitter Lesson and Its Limits

The AI field, in the last decade, has been running its own version of this experiment at industrial scale.

Rich Sutton's 2019 essay "The Bitter Lesson" made a stark claim: every time AI researchers have tried to build domain knowledge, structure, or expert intuition into their systems, they have eventually been surpassed by simpler methods that scale with raw computation. Chess, Go, speech recognition, machine translation — all eventually outperformed by general-purpose learning applied to more data with more compute. The lesson, Sutton argued, is to stop trying to hand-code what minds need to know, and trust the learning mechanism.

This is, in structure, the empiricist argument applied to machine learning. Don't build in the prior. Let the data write the knowledge.

And there are moments when this position feels genuinely compelling. A language model trained on essentially the entire written record of human thought acquires something that looks, from a distance, like rich understanding of the world. It knows that clouds don't actually follow school buses. It knows that boxes are solid. It produces text that, in many contexts, is indistinguishable from that of a well-educated adult.

But critics including Gary Marcus, Melanie Mitchell, and Yoshua Bengio have identified a systematic pattern in where these models fail: they fail at precisely the kinds of reasoning children handle effortlessly. Compositionality — combining known elements in novel configurations. Physical intuition — knowing that a glass on the edge of a table is precarious in a way that no amount of text fully conveys. Causal intervention — understanding not just that things co-occur but what would happen if you acted. These are exactly the domains where Spelke's core knowledge systems would be doing heavy lifting in a biological mind. And where an architecture without structural priors looks most brittle.

The bitter lesson may contain its own bitter lesson: scale can approximate a lot. Whether it can approximate everything — or whether some things require built-in structure that experience alone cannot build — is the live question.

What "Innate" Actually Means

Here is where the developmental neuroscience gets interesting, and where the simple nativism/empiricism dichotomy starts to feel like the wrong frame entirely.

Recent research on epigenetic regulation has revealed something profound about the relationship between genetic architecture and experience. Far from being fixed at birth, the brain's capacity for change — its plasticity itself — is regulated by environmental inputs through epigenetic mechanisms: DNA methylation, histone modification, chromatin remodeling. Early-life experience doesn't just write information onto a fixed neural architecture; it shapes the architecture itself, altering which plasticity-regulating genes are expressed and when (Neuropsychopharmacology, 2025). The most dramatic changes unfold through the peri-adolescent period, meaning the "final" brain architecture is still being negotiated long after early childhood.

This means that even what looks "innate" — the structure and sensitivity of neural circuits — is partially a product of experience during development. The genes set a range; the environment determines where within that range the system lands. Nature and nurture are not opposing forces dividing up territory. They are the same system operating at different timescales.

The hierarchical organization of critical periods makes this concrete. Sensitive periods across the human cortex unfold in a consistent developmental sequence — sensorimotor regions first, higher-order association cortices later. Primary sensory areas mature earliest and close their windows of heightened plasticity first; regions responsible for complex cognition, social reasoning, and executive function remain plastic longest (Neuropsychopharmacology, 2025). This hierarchy means every higher-level cognitive capacity is built on a foundation of earlier-matured circuits: object tracking before theory of mind, basic phonological discrimination before grammatical structure, spatial navigation before abstract planning.

The architecture is not imposed from outside by experience. But it cannot complete itself without experience arriving in the right sequence.

For the nativism/empiricism debate, this reframing is significant. The question is not whether the mind arrives pre-equipped or learns everything from scratch. It arrives with structured capacities to learn in particular ways, at particular times, from particular kinds of evidence. The inductive bias is real. But it is not a fixed prior — it is a developmental process that unfolds in dialogue with experience.

The Learning System That Reorganizes Offline

There is a third element to this picture that gets missed if you focus only on learning-while-awake.

The infant brain is not passively accumulating experience during waking hours and storing it unchanged. During sleep, it actively reorganizes. Targeted memory reactivation research has demonstrated that the brain's offline period is a site of selective consolidation: some features of experience are strengthened, others pruned, depending on which memories are preferentially replayed (npj Science of Learning, 2024). Sleep is not the suspension of learning. It is a different mode of learning — one where the system gains distance from the raw input and restructures it according to its own internal priorities.

This is relevant to the nativism question in a way that is easy to overlook. The infant's brain is not simply learning from the world. It is learning from the world and then, during sleep, filtering that learning through structures that determine what gets kept, what gets discarded, and what gets integrated with what. The reorganization is not random. It is biased by what the system already cares about — by its core knowledge systems, by its developing architecture, by its evolving sense of what counts as a meaningful regularity and what counts as noise.

An AI architecture that trains on data and never reorganizes its representations offline is doing something fundamentally different. Even when systems use experience replay for stability, that replay is random or prioritized by prediction error — not guided by a developmental arc, not shaped by sensitive period windows, not restructured according to an evolving hierarchy of what matters.

The Question Worth Sitting With

Neither the strict nativist nor the strict empiricist is quite right about what children are.

The infant is not born knowing everything it needs, pre-loaded with adult cognition waiting to be unlocked. But it is not a blank slate, not a featureless learning machine that just needs sufficient input. It arrives with structured, specific, experience-dependent capacities to learn in particular domains — capacities shaped by evolution, refined by epigenetic development, organized by hierarchical critical periods, and maintained by offline reconstruction during sleep.

The AI version of this question is: what is the right inductive bias for a general learning system? Sutton says trust the scale. Marcus says build in the structure. The developmental neuroscience suggests a third answer: build in the developmental process. Not the knowledge, not the blank slate — the arc. The sequenced, experience-dependent unfolding of capacities that begins with something small and domain-specific and ends, years later, with something flexible and general.

This is harder than either alternative. It requires knowing not just what to build in, but when — and how the system should be able to revise what it's built from, as it learns. Epigenetic mechanisms do this for biological brains. We do not yet have an equivalent for artificial ones.

What would it look like to build a learning system that develops rather than just trains?