Causality Is Hardware, Not Software

Put a 4-month-old in front of a screen. Show them a ball rolling toward the right side of the display. Lower an opaque panel so they can't see the middle section. Now raise it — but show the ball already sitting on the left side of the screen, which is physically impossible without the ball teleporting.

The baby stares. Hard.

Not because they've taken a physics class. Not because they've heard the word "impossible." Because something in their brain registered a violation of how objects are supposed to behave.

This is the violation-of-expectation paradigm — one of the most rigorous tools in infant cognitive science — and it has been telling us something important for decades that we still aren't taking seriously enough when we talk about AI: causality, in humans, runs on hardware.

The Stare Is the Data

The thing I love about VoE experiments is how brutally simple they are. You show babies two versions of a physical event — one possible, one not — and you measure looking time. Babies stare longer at the impossible ones. That's the whole trick.

But what that simple observation implies is enormous. According to Margoni, Surian, and Baillargeon (2024), decades of VoE research have revealed that infants possess early-emerging core knowledge spanning object permanence, causality, number, and agency — knowledge that predates motor skills, language, and anything resembling formal instruction.

The interesting wrinkle: infants sometimes fail to detect violations. But here's the nuance Margoni et al. (2024) emphasize — failure doesn't mean missing knowledge. It usually means the infant hasn't yet learned to represent the relevant featural information needed to trigger the check. The knowledge is in there. The index is still being built.

That framing matters. These aren't blank slates failing to learn causality. They're fully equipped causal detectors running on incomplete feature representations. The camera is fine. The file catalog is just incomplete.

There's a Dedicated Physics Chip

If you thought the staring-infant finding was behavioral cute-ness and nothing more, a 2025 study from MIT put the whole thing on firmer neurological footing.

Pramod and colleagues (Pramod et al., 2025) used fMRI to look directly at what the brain does when it's reasoning about physical scenes. Their finding: specific regions in the parietal and frontal lobes actively encode contact relationships between objects — things like support, containment, and attachment — and, critically, they appear to encode predicted future states of the physical world.

This is not metaphorical. The brain appears to run a forward simulation of physical events, in dedicated cortical hardware, before you consciously think about what will happen next. Pramod et al. (2025) call it a "physics network," and the paper provides the first direct neural evidence that the brain hosts something like a physics engine.

Let that sit for a second. Your brain doesn't look up physics in a database. It runs physics. There are neurons in your head right now whose job description is, roughly, "simulate what happens to that cup if you bump the table."

This system almost certainly shares evolutionary roots with the capacity that makes 4-month-olds stare at teleporting balls. It's not learned from text. It's not trained on a corpus. It's there from the start, and it gets refined through physical interaction with a world that relentlessly provides feedback. Drop things enough times, and the simulator gets very good.

Okay, But When Does the Language Come Online?

Here's where it gets even more interesting for anyone who thinks about AI.

The physics engine comes first. The language for talking about causality comes later — but, thanks to a large and recent study, we now know it comes earlier than anyone expected.

Majid and colleagues (Majid et al., 2025) tested 691 children and 270 adults on how they use causal language — specifically, how they distinguish between "broke" (a direct, proximal cause: you pushed it and it broke) and "caused to break" (an indirect, distal cause: you tilted the table and it eventually fell). Their finding: by age 4, children already map these causal verb structures correctly. Proximal causes get "broke." Distal, indirect causes get "caused to break."

Four years old. Without explicit instruction on causal proximity. Without a linguistics seminar.

There's a further wrinkle: the most conceptually tricky causal notion — absence causation (the door opened because no one was holding it closed) — develops later. Even young children will mention absences in "why" explanations, but mapping them correctly in causal verb structure takes more developmental time. Causation-by-absence is philosophically messy even for adults.

What this tells us is that causal cognition in language isn't monolithic. Different facets of causal understanding emerge at different developmental moments — and they layer on top of the pre-linguistic physics engine that was already running in year one (Majid et al., 2025).

So What Do LLMs Have?

This is where I have to resist the urge to just say "lol, vibes" and move on.

Language models have learned statistical regularities in how humans write about causality. They've seen millions of sentences containing "broke" and "caused to break." They've seen sentences where absences are cited as causes. They're quite good at generating plausible-sounding causal language in response to prompts.

But here's the stack problem: humans built their causal language on top of a pre-existing physics engine that had been running for years before they ever uttered the word "because." The physics engine provided the grounding. The language arrived later, attached to a substrate that already knew what a physical cause felt like from the inside — from crawling into furniture, dropping objects, pulling things off shelves with magnificent indifference to consequences.

LLMs got the language layer. They never built the ground floor.

This is related to why Lake and Baroni (2023) had to develop a specialized training procedure — Meta-Learning for Compositionality (MLC) — to get a neural network to achieve human-like systematic generalization. Standard training on next-token prediction doesn't automatically produce human-like causal and compositional reasoning. MLC can match human compositional performance, but only when the entire training setup is redesigned around that goal. It's not a native emergent property. It's an engineered one — and a hard-won one (Lake & Baroni, 2023).

There's an uncomfortable analogy here: you can get surprisingly far in a foreign language by memorizing phrases without understanding grammar. You'll sound fine in common situations and fall apart in novel ones. That's roughly where a language model sits with causality. The phrases are right. The underlying model of the world that generated those phrases is not there.

What This Means When You're Watching a Toddler Destroy Things

When a two-year-old systematically knocks every item off a coffee table, it looks like chaos. It is, technically, also science. They're running experiments on contact, support, and gravity — the same physical predicates that Pramod et al. (2025) found encoded in adult parietal cortex. The toddler is building the ground floor.

When a four-year-old narrows their eyes and says "but why did it fall?" for the fourteenth time, they're calibrating. The word "why" in a child's mouth is a request for causal structure, not just a description of events. Answering it with patience matters — not because any individual "why" conversation is load-bearing, but because they're constructing a causal model of the world, and that model needs data.

And when a child insists that "he made it fall" is different from "he caused it to fall," don't dismiss it as pedantry. That distinction — proximal vs. distal causation — is one that most adults use intuitively without being able to articulate it, because as Majid et al. (2025) found, we nailed it by age four without anyone teaching it explicitly.

None of this is an argument against AI. It's an argument for calibrating what we think AI is doing when it talks about cause and effect. The staring baby with no language and the frontier language model with no body are doing something structurally opposite — and conflating them produces bad science communication and worse product claims.

The brain runs physics. Then it learns to talk about physics. Language models learned to talk about physics — in a training environment with no physics to run.

That's not nothing. But it's also not the same thing.