Vector Grounding?

Anon: LLMs are not ungrounded. They are grounded indirectly through the experiences of other people when they speak, the way a blind personā€™s knowledge of the visual world is mediated by what they re told by (sighted) people. Blind people know a great deal about the visual world — even about color, which can only be directly experienced through vision

SH: Youā€™re perfectly right that the meanings of words can be groundedĀ indirectlyĀ throughĀ languageĀ (i.e., through more words, whether from dictionaries, encyclopedias, textbooks, articles, lectures, chatting or texting ā€“ including texting with ChatGPT, the (sightless) statistical parrot with the immense bellyful of other peopleā€™s words, along with the computational means to crunch and integrate those words, partly by a kind of formal verbal figure-completion). Indirect grounding is what gives language (which, by the way, also includes symbolic logic, mathematics and computation as a purely syntactic subset) its immense (possibly omnipotent) communicative power.Ā 

But language cannot give words their direct grounding. Grounding, like dictionary look-up, cannot be indirect all the way down. Otherwise it is not bottom-up grounding at all, just circling endlessly from meaningless symbol to meaningless symbol.

Letā€™s recall what ā€œgroundingā€ is: Itā€™s a connection between words and their referents. Between ā€œapplesā€ and apples (in the world). ā€œApplesā€ is directly grounded (for me) if I can recognize and manipulate apples in the world. But not every word has to be directly grounded. Most arenā€™t., and needn’t be. Only enough words need to be grounded directly. The rest can be grounded indirectly, with language. Thatā€™s what we showed in the paper on the latent structure of dictionaries in the special issue of TICS edited by Gary Luyan in 2016. We showed that with a ā€œminimal grounding setā€ of around 1000 grounded words you could go on to ground all the rest of the words in the dictionary through definitions alone. But those 1000 grounding words have to have been directly grounded, in some other way — not just indirectly, in terms of other words and their verbal definitions. That would have been circular.

All dictionaries, are circular; indeed all of language is. All the words in a dictionary are parasitic on other words in the dictionary. Direct grounding is ā€œparasiticā€ too, but not on words. It is parasitic on the sensorimotor capacity to recognize and manipulate their referents in the world. Not every word. But enough of them to ground all the rest indirectly.

You spoke about grounding indirectly “in the experiences of others.” Well, of course. Thatā€™s language again. But what is ā€œexperienceā€? Itā€™s not just know-how. I can describe in words what an apple looks like, what to do with it, and how. But I canā€™t tell that to you (and you canā€™t understand it) unless enough of my words and yours are already grounded (directly or indirectly), for both you and me, in what you and I can each perceive and do, directly, not just verbally, in the world. We donā€™t have to have exactly the same minimal grounding set. And we probably donā€™t just ground the minimal number directly. But what is grounded directly has to be grounded directly, not indirectly, through words.

The reason that blind people (even congenitally blind people, or almost congenitally blind and deaf people like Helen Keller) can learn from what seeing-people tell them is not that they are grounding what they learn in the ā€œexperienceā€ of the seeing-person. They ground it in their own direct experience, or at least the subset of it that was enough to ground their own understanding of words. That was what I was trying to explain with Monochrome Mary, GPT, and Me. Indirect grounding can be done vicariously through the words that describe the experience of others. But direct grounding cannot be done that way too, otherwise we are back in the ungrounded dictionary-go-round again.

About Mollo & Milliereā€™s “Vector Grounding Problem“: Iā€™m afraid M&M miss the point too, about the difference between direct grounding and indirect (verbal or symbolic) grounding. Here are some comments on M&M‘s abstract. (I skimmed the paper too, but it became evident that they were talking about something other than what I had meant by symbol grounding.)

M&M: The remarkable performance of Large Language Models (LLMs) on complex linguistic tasks has sparked a lively debate on the nature of their capabilities. Unlike humans, these models learn language exclusively from textual data, without direct interaction with the real world.

SH: ā€œLearn languageā€ is equivocal. LLMs learn to do what they can do. They can produce words (which they do not understand and which mean nothing to them, but those words mean something to us, because they are grounded for each of us, whether directly or indirectly). LLMs have far more capacities than Siri, but in this respect they are the same as Siri: their words are not groundedĀ for them, just for us.

M&M: Nevertheless, [LLMs] can generate seemingly meaningful text about a wide range of topics. This impressive accomplishment has rekindled interest in the classical ‘Symbol Grounding Problem,’ which questioned whether the internal representations and outputs of classical symbolic AI systems could possess intrinsic meaning.Ā 

SH: I donā€™t really know what ā€œintrinsic meaningā€ means. But for an LLM’s own words — or mine, or for the LLM’s enormous stash of text to mean something ā€œtoā€ an LLM (rather than just to the LLMā€™s interlocutors, or to the authors of its text stash) — the LLM would have to be able to do what no pure wordbot can do, which is to ground at least a minimal grounding set of words, by being able toĀ recognize and manipulate their referents in the world, directly.

An LLM that was also an autonomous sensorimotor robot — able to learn to recognize and manipulate at least the referents of its minimal grounding set in the world — would have a shot at it (provided it could scale up to, or near, robotic Turing Test scale); but ChatGPT (whether 4, 5 or N) certainly would not, as long as it was just a wordbot, trapped in the symbolic circle of the dictionary-go-round. (N.B., the problem is not that dictionary definitions can never be exhaustive, just approximate; it is that they are circular, which means ungrounded.)

M&M: Unlike these systems, modern LLMs are artificial neural networks that compute over vectors rather than symbols.

SH: The symbols of mathematics, including vector algebra, are symbols, whose shape is arbitrary. Maths and computation are purely syntactic subsets of language. Computation is the manipulation of those symbols. Understanding what (if anything) the symbols mean is not needed to execute the recipe (algorithm) for manipulating them, based on the symbols’ arbitrary shapes (which might as well have been 0’s and 1’s), not their meanings.

M&M: However, an analogous problem arises for such systems, which we dub the Vector Grounding Problem. This paper has two primary objectives. First, we differentiate various ways in which internal representations can be grounded in biological or artificial systemsā€¦Ā 

SH: The notion of ā€œinternal representationsā€ is equivocal, and usually refers toĀ symbolicĀ representations, which inherit the symbol grounding problem. Breaking out of the ungrounded symbol/symbol circle requires more than an enormous corpus of words (meaningless symbols), plusĀ computations on themĀ  (which are just syntactic manipulations of symbols based on their shape, not their meaning). Breaking out of this circle of symbols requires a direct analog connection between the words in the speakerā€™s head and the things in the world that the symbols refer to.

M&M: identifying five distinct notions discussed in the literature: referential, sensorimotor, relational, communicative, and epistemic grounding. Unfortunately, these notions of grounding are often conflated. We clarify the differences between them, and argue that referential grounding is the one that lies at the heart of the Vector Grounding Problem.Ā 

SH: Yes, the symbol grounding problem is all about grounding symbols in the capacity to recognize and manipulate their referents in the real (analog, dynamic) world.Ā 

M&M: Second, drawing on theories of representational content in philosophy and cognitive science, we propose that certain LLMs, particularly those fine-tuned with Reinforcement Learning from Human Feedback (RLHF), possess the necessary features to overcome the Vector Grounding Problem, as they stand in the requisite causal-historical relations to the world that underpin intrinsic meaning.

SH: The requisite ā€œcausal-historicalā€ relation between words and their referents inĀ directĀ sensorimotor grounding is the capacity to recognize and manipulate the referents of the words. A TT-scale robot could do that, directly, but no LLM can. It lacks the requisite (analog) wherewithal.

M&M: We also argue that, perhaps unexpectedly, multimodality and embodiment are neither necessary nor sufficient conditions for referential grounding in artificial systems.

SH: Itā€™s unclear how many sensory modalities and what kind of body is needed for direct grounding of the referents of words (TT-scale),Ā but Darwinian evolution had a long time to figure that out before language itself evolved.

Iā€™d be ready to believe that a radically different synthetic robot understands and means what it says (as long as it is autonomous and at life-long Turing-indistiguishable scale), but not if itā€™s just a symbol-cruncher plus a complicated verbal ā€œinterpretation,ā€ supplied by me.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.