ChatGPT, Language and Symbol Grounding

Emily Bender (as well as Timnit Gebru and “Shmargaret Shmitchell”) are warning that AI is not really “thinking” by a “thinker,” that its development and use is contaminated by commercial and malign interests, and that it poses dangers. They are right. But they do not seem to have any new insights into why and how AI is not really thinking (nor do they, or anyone, have any solutions for the dangers it poses).

Bender’s point about the difference between form and meaning in language is actually about the symbol grounding problem (which she does cite, but as far as I can see, she does not have a proposal for solving it).

There is way too much chatter going on about chatGPT right now, so it’s very hard to see whether there is any signal in all the noise about it. It seems to be the same thing over and over about “take-overs” and “singularities”.

About chatGPT, my own tentative understanding is that its performance capacity shows how much latent structure there is in words and propositions, across all languages, even though the shape of words is arbitrary, and so is the shape of the syntactic conventions we adopt in our languages (with the possible exception of Chomsky’s Universal Grammar).

The reason is simple: Despite the obvious differences between images and words (or objects and their verbal descriptions) some of the structure of things in the world is systematically shared by the structure of the sentences describing them. And that’s what gives the ungrounded universe of words that chatGPT swallows, together with the algorithms it applies to them, both the real capabilities it has, and the illusion that it gives us, of talking to a real “thinker.”

A trivial example will illustrate this. Although a cat lying on a mat can be described in countless different ways (“a picture is worth more than 1000 words…”), within and across countless languages, even the simple arbitrary english proposition “The cat is on the mat” shares, systematically, some structural properties of the scene and object it is describing. That structure, encoded in the verbal description, is systematically present in all verbal descriptions, and it is extended systematically in bags of googols and googolplexes of words and propositions.

That structure is, in a way, a formal shadow of the thing described, in the description. It’s also what makes google’s database of “commented” images so searchable and navigable. (It does it doubly well for software code, if the code itself is also commented in English [or any language]).

This shadowed structure is an epiphenomenon; it is not meaning, or thinking. But it can still do a lot of things that look intelligent, because it is parasitic on the grounded meanings of the words in the heads of all the human speakers that spoke or wrote all the googols of words, mined by the algorithms underlying chatGPT and the like.

Apart from all the money and mischief to be made by mining and manipulating these shadows, they are also reflections of the revolutionary nature and power of language itself, the cognitive capacity it provides to real, grounded brains and bodies to encode the structure of their sensorimotor experience and learning into communicable and storable words and propositions.

All the rest is in the sensorimotor (robotic) grounding in the brains of the real thinkers who ground and use the words.

None of this is deep or rocket-science stuff. But it’s what makes us perceive that we are communicating with a thinker when we communicate with it in words. That in turn is driven by our “mirror neurons,” which trade on the neurological similarity of shape between what we do, and what we see others doing. That is what enables us to mimic and imitate, not just with mutual gestures, but also with words and propositions and their shared sensorimotor grounding. That is what underlies both our everyday mutual mind-reading and (robotic) Turing-Testing. It’s also what is completely missing in chatGPTs, which are just ungrounded, well-mined wordbags that Bender calls stochastic parrots,” parasitic on the shared structure between our words and the world they are about.

Word-Cloud Golem (Dall-E) 12/3/2023