Monochrome Mary, GPT, and Me

Frank Jackson’s hypothetical black-and-white room-reared color neuroscientist “Mary”, unlike ChatGPT, sees (but only black/white/gray). So her vocabulary for setting up her explanation of color perception is nevertheless grounded. (Even Helen Keller’s was, because, lacking sight and hearing since 19 months old, she still had touch and kinesthesia.) ChatGPT has no grounding at all.

So we can get from “horse, horn, vanish, eyes, observing instruments” to the unobservable “peekaboo unicorn” purely verbally, using just those prior (grounded) words to define or describe it.

But ChatGPT can’t get anywhere, because it has no grounded vocabulary at all. Just words “in context” (“100 trillion parameters and 300 billion words…: 570 gigabytes of text data”).

So much for ChatGPT’s chances of the “phase transition” some emergence enthusiasts are imagining — from just harvesting and organizing the words in ChatGPT’s bloated 570gig belly, to an “emergent” understanding of them. You, me, Helen Keller or Monochrome Mary, we could, in contrast, understand ChatGPT’s words. And we could mean them if we said them. So could the dead authors of the words in ChatGPT’s belly, once. That’s the difference between (intrinsic) understanding and word-crunching by a well-fed statistical parrot.

Two important, connected, details: (1) Mary would be more than surprised if her screen went from B/W to color: she would become able to DO things that she could not do before (on her own) locked in her B/W room — like tell apart red and green, just as would any daltonian, if their trichromacy were repaired.

More important, (2) Mary, or Helen, or any daltonianian could be told any number of words about what green is neurologically and what things are green and what not. But they cannot be told what seeing green FEELS-LIKE (i.e., what green looks-like). And that’s the point. If it weren’t for the “hard problem” – the fact that it feels-like something to see, hear, touch [and understand and mean] something (none of which is something you DO) – Mary, Helen, and a daltonian would not be missing anything about green. Grounding in transducer/effector know-how alone would be all there was to “meaning” and “understanding.” But it’s not.

Mary, Helen and the daltonian can make do with (imperfect) verbal analogies and extrapolations from the senses they do have to the senses they lack. But ChatGPT can’t. Because all it has is a bellyful of ungrounded words (but syntactically structured by their grounded authors – some of whom are perhaps already underground, long dead and buried…).

So meaning and understanding is not just sensorimotor know-how. It’s based on sentience too (in fact, first and foremost, though, so far, it’s an unsolved hard-problem to explain how or why). ChatGPT (attention “singularity” fans!) lacks that completely.

It FEELS-LIKE something to understand what red (or round, or anything) means. And that’s not just transducer/effector know-how that percolates up from the sensorimotor grounding. It’s what it feels-like to see, hear, touch, taste and manipulate all those things out there that we are talking about, with our words.

ChatGPT is talking, but ABOUT nothing. The “aboutness” is supplied by the original author’s and the reader’s head. ChatGPT is just producing recombinatory, fill-in-the-blanks echolalia.

Leave a Reply