GPT definitely does not understand. It’s just a computer program plus an unimaginably huge database of the words that countless real thinking people have written, in books, articles, and online media. The software does increasingly sophisticated “filling in the blanks” when it answers questions, using that enormous data-base of things that other people have written (and spoken) about all kinds of other things. It is amazing how much can be gotten out of such an enormous database by software (which is what GPT is).
GPT neither understands nor means anything with what it says. It is (as Emily Bender has aptly dubbed it) a “statistical parrot.” It doesn’t parrot verbatim, echolalicly, what words are said to it, but it draws on the combinations of the words said to it, along with all those other words it has ingested, recombining them to fill in the blanks. “What is the opposite of down?” No one is surprised that GPT says “up.” Well, all those other verbal interactions are the same, only you need very powerful software and an enormous database to fill in the blanks there too.
One other thing (much more important than fussing about whether GPT thinks, understands or means anything — it doesn’t): The success of GPT teaches us something about the nature of language itself: Language can encode and transmit the thoughts of real, thinking people. The words (in all languages) are arbitrary in shape. (“Round” does not look round.) Single words could all just as well have been strings of 0’s and 1’s in Morse code (of English or any other language; “round” = .-. — ..- -. -..). The strings of words, though (rather than just sounds or letters), are not quite that arbitrary. Not just because the patterns follow grammatical rules (those rules are arbitrary too, though systematic), but, once people agree on a vocabulary and grammar, the things they say to one another, preserve (in linguistic code) some of the structure of what speakers and hearers are thinking, and transmitting, in words. Their form (partly) resembles their meaning. They are the syntactic shadows of semantics.
Language is not video (where it’s form that preserves form). And video is not language. If I want you to know that the cat is on the mat (and the cat really is on the mat, and we are zooming), I can aim the camera at the cat and if you have functioning eyes, and have seen cats and mats before, you will see that the cat is on the mat. And if you can talk English, you can caption what you are seeing with the sentence “The cat is on the mat.” Now that sentence does not look anything like a cat on a mat. But, if you speak English, it preserves some of its structure, which is not the same structure as “the mat is on the cat” or “the chicken crossed the road.”
But if there were nothing in the world but mats and cats, with one on the other or vice versa, and chickens and roads, with the chicken crossing or not crossing the road, then any trivial scene-describing software could answer questions like: “Is there anything on the mat?” “What?” “Is anything crossing the road?” Trivial. Could also easily be done with just words, describing what’s where, in a toy conversational program.
Now scale that up with what GPT can do with words, if it has an enormous sample of whatever people have said about whatever there is to say, as GPT has. “Who killed Julius Caesar?” “What is the proof of Fermat’s last theorem?” That’s all in GPT’s database, in words. And if you talk about more nonsensical things (“Is there a Supreme Being?” “Is there life after death?” “Will Trump stage a second coming?”) it’ll parrot back a gentrified synthesis of the same kinds of nonsense we parrot to one another about that stuff — and that’s all in the database too.
Every input to GPT is saying or asking something, in the same way. It just sometimes takes more work to fill in the blanks from the database in a way that makes sense. And of course GPT errs or invents where it cannot (yet) fill the blanks. But one of the reasons OpenAI is allowing people to use (a less advanced version of) GPT for free, is that another thing GPT can do is learn. Your input to GPT can become part of its database. And that’s helpful to OpenAI.
It’s nevertheless remarkable that GPT can do what it does, guided by the formal structure inherent in the words in its database and exchanges. The statistical parrot is using this property of spoken words to say things with them that make sense to us. A lot of that, GPT can do without help, just filling in the blanks with the ghosts of meaning haunting the structure of the words. And what’s missing, we supply to GPT with our own words as feedback. But it’s doing it all with parrotry, computations, a huge database, and some deep learning.
So what do we have in our heads that GPT doesn’t have? Both GPT and we have words. But our words are connected, by our eyes, hands, doings, learnings and brains, to the things those words stand for in the world, their referents. “Cat” is connected to cats. We have learned to recognize a cat when we see one. And we can tell it apart from a chicken. And we know what to do with cats (hug them) and chickens (also hug them, but more gently). How we can do that is what I’ve dubbed the “symbol grounding problem.” And GPT can’t do it (because it doesn’t have a body, nor sense and movement organs, nor a brain, with which it can learn what to do with what – including what to call it).
And we know where to go to get the cat, if someone tells us it’s on the mat. But the sentence “the cat is on the mat” is connected to the cat’s being on the mat in a way that is different from the way the word “cat” is connected to cats. It’s based on the structure of language, which includes Subject/Predicate propositions, with truth values (T & F), and negation.
GPT has only the words, and can only fill in the blanks. But the words have more than we give them credit for. They are the pale Platonic contours of their meanings: semantic epiphenomena in syntax.
And now, thanks to GPT, I can’t give any more take-home essay exams.
Here are some of my chats with GPT.
P.S. If anyone is worrying that I have fallen into an early-Wittgensteinian trap of thinking only about a toy world of referents of concrete, palpable objects and actions like cats, mats, roads, and crossing, not abstractions like “justice,” “truth” and “beauty,” relax. There’s already more than that in sentences expressing propositions (the predicative IS-ness of “the cat is on the mat” already spawns a more abstract category — “being-on-mat”-ness — whose name we could, if we cared to, add to our lexicon, as yet another referent). But there’s also the “peekaboo unicorn” — a horse with a single horn that vanishes without a trace the instant anyone trains eyes or measuring instrument on it. And the property of being such a horse is also a referent, eligible for filling in the blank for a predicate in a subject-predicate proposition, is now available. (Exercise of scaling this to justice, truth and beauty is left to the reader. There’s a lot of structure left in them there formal shadows.)