SOCRATIC DIALOGUE ON UNIVERSAL GRAMMAR

SH:   What is Steve Piantadosi’s view on UG and the poverty of the stimulus?

CHATGPT:

Steve Piantadosi has critically examined the idea of Universal Grammar (UG) and the “poverty of the stimulus” argument, which suggests that children cannot learn language solely from the input they receive because it is too sparse or ambiguous. Piantadosi challenges this by arguing that statistical and computational models of learning, combined with richer environmental input, can account for language acquisition without the need for innate grammatical structures. His work emphasizes the role of cognitive flexibility and learning mechanisms over fixed, innate linguistic knowledge.

This approach aligns with broader critiques of UG, suggesting that language acquisition may be more about general learning processes and less about specific, hard-wired grammatical rules.

SH: and what are the counterarguments to this view about UG and POS?

CHATGPT:

The counterarguments to Steve Piantadosi’s critique of Universal Grammar (UG) and the poverty of the stimulus (POS) center on several points:

  1. Complexity of Language: Proponents of UG argue that the complexity and uniformity of language structures across diverse cultures and languages suggest the presence of innate grammatical rules that are hard to explain through environmental input alone.
  2. Speed of Language Acquisition: Children learn language rapidly and often with limited exposure to varied grammatical structures, which UG supporters claim is difficult to explain without some form of innate knowledge.
  3. Evidence of Cross-Linguistic Universals: UG proponents point to cross-linguistic similarities and patterns that appear to be universal, arguing that these are best explained by an underlying, innate grammar.
  4. Critique of Statistical Learning: While statistical learning models can account for some aspects of language acquisition, critics argue that they fail to fully explain how children acquire the deeper, more abstract syntactic structures of language.

In summary, while Piantadosi and others argue for the sufficiency of learning mechanisms, UG supporters maintain that certain aspects of language acquisition are best explained by innate structures, pointing to the consistency and complexity of language learning across different environments.

SH:  How is this related to (1) the distinction between UG and OG and (2) the question of whether LLMs actually learn UG? 

CHATGPT:

 Let’s break this down:

  1. UG (Universal Grammar) vs. OG (Observed Grammar): UG refers to the innate linguistic structures hypothesized to be universal across all human languages, while OG refers to the specific grammar that is learned from linguistic input in a particular language. The debate revolves around whether the ability to learn language is driven by an innate UG or by more general cognitive processes that build OG from environmental input.
  2. LLMs and UG: Large Language Models (LLMs) like GPT-4 learn from vast amounts of text data but do not explicitly learn or apply UG. Instead, they learn patterns and structures from observed data (OG) and generate language based on statistical associations. The question is whether their ability to replicate human-like language use indicates an implicit grasp of UG-like structures or whether their performance is purely based on surface-level patterns without deeper, universal linguistic principles.

Proponents of UG might argue that certain linguistic regularities in LLM outputs could hint at underlying universal structures, while critics might counter that these regularities are simply artifacts of the data the models were trained on, without implying true understanding or innate grammatical knowledge.

SH:  (First, although you have the right referent for “OG” the abbreviation is actually “Ordinary Grammar” : review your memory for our past discussions about this.)

CHATGPT:  

Relation Between UG, OG, and LLMs:

  1. UG (Universal Grammar) vs. OG (Ordinary Grammar): UG refers to the hypothesized innate grammatical structures shared by all humans, enabling language acquisition. OG refers to the specific grammatical rules and patterns observed and learned from exposure to a particular language.
  2. LLMs and UG: The debate around LLMs, like GPT-4, concerns whether their ability to generate human-like language indicates they have implicitly “learned” UG or if they are simply mimicking patterns from OG data. Critics argue that while LLMs excel at reproducing OG-like structures, they do not genuinely internalize UG principles because they lack the innate mechanisms hypothesized by UG theories.

Summary:

  • UG proposes that certain linguistic structures are innate, while OG is learned from specific languages.
  • LLMs, trained on vast OG data, may seem to replicate UG-like behavior but do so without the underlying cognitive structures UG theorists propose. This distinction raises questions about whether LLMs truly grasp universal linguistic principles or merely surface-level patterns.

SH:  Ok. Now something deeper and more serious:

(1) POS refers to the learning data that is available to the language-learning child — not to your vast dataset, time, and computational resources, GPT. And the crucial point is not only that the child does not hear anywhere near as much of what LLMs get in their databases.

(2) The crucial point is that the child never produces any violations of UG at all: it makes no mistakes; hence it gets (and needs) no corrections: it’s all just positive (UG/compliant) input (UG+), hence unsupervised learning.

(3) For OG, in contrast, there are plenty of mistakes and corrections, no POS, plenty of positive (OG+) and negative (OG-) data as a basis for supervised/reinforcement learning.

(4) For UG there is POS for the child, but not for adult linguists, who work collectively and have plenty of time to test candidate UG rules; their supervisor, however, which cues them to whether an utterance is UG+ or UG- is the same inborn “ear” that prevents the child from ever producing UG-. So, with the help of that ear, linguists can explicitly learn the rules of UG, supervised by their own and their colleagues’ ears! (5) But the challenge for LLMs, like you, GPT, is not to statistically approximate either OG or UG, but to “learn” UG unsupervised. (Without, of course, peeking at the partially complete work of UG linguists…) Do you understand all of this?

CHATGPT:

Yes, I understand your points:

  1. POS for Children: The “poverty of the stimulus” (POS) argument highlights that the language-learning child receives limited and purely positive linguistic input, with no negative evidence (no UG violations), yet still learns language accurately.
  2. Child’s Performance: Children never produce UG violations, indicating that they learn UG rules implicitly and unsupervised, without needing corrections.
  3. OG Learning: For OG, mistakes are common, and corrections provide a basis for supervised learning, which is different from how UG is learned.
  4. Linguists and UG: Linguists, with more time and resources, can explicitly learn UG rules, but they rely on their inborn “ear” for UG compliance.
  5. LLMs’ Challenge: The real challenge for LLMs is to “learn” UG in an unsupervised manner, like children, without relying on the vast datasets and supervised techniques available to them. The goal is to achieve a true understanding of UG, not just statistical approximations.

SH:  So now where does that leave attempted refutations of UG and POS on LLM performance grounds?

CHATGPT:  

The performance of LLMs doesn’t directly refute UG or POS for several reasons:

  1. Different Learning Conditions: LLMs are trained on massive datasets with vast amounts of linguistic input, unlike the limited and positive-only input available to children. LLMs use statistical patterns, while children learn UG implicitly.
  2. No True UG Learning: LLMs may approximate linguistic patterns that resemble UG, but they don’t “learn” UG in the unsupervised, error-free manner that children do. Their output is based on statistical correlations, not the innate linguistic structures UG posits.
  3. Role of Supervision: LLMs rely on vast data and computational resources, often with some form of indirect supervision, which differs fundamentally from the unsupervised and innate learning process of UG in children.

Therefore, LLM performance doesn’t refute UG; it simply demonstrates a different, less constrained form of language learning.

SH:  And what about the OG/UG distinction? Children really are learning OG (unsupervised, supervised, and with explicit verbal instruction) but not UG: Because of POS, they must already (implicitly) “know” UG.

CHATGPT:

Exactly. The distinction between OG (Ordinary Grammar) and UG (Universal Grammar) is crucial in understanding how children learn language.

  • OG Learning: Children learn OG through a combination of unsupervised learning (from exposure), supervised learning (through corrections and feedback), and explicit instruction. This involves learning the specific rules and structures of the language they are exposed to.
  • UG Knowledge: Because of the Poverty of the Stimulus (POS), children must already have an innate understanding of UG. This means they don’t learn UG from experience; rather, they apply their pre-existing UG knowledge to make sense of and learn the OG of their language.

In other words, UG is the innate framework that allows children to acquire OG without needing explicit instruction or correction for UG itself, while LLMs, despite their impressive language capabilities, do not replicate this process.

SH: Ok, so where does this leave what you said about Piantadosi’s view?

CHATGPT:

Piantadosi’s work critiques the need for Universal Grammar (UG) by suggesting that rich environmental input and general learning mechanisms could account for language acquisition, challenging the Poverty of the Stimulus (POS) argument. However, given the distinction between OG (Ordinary Grammar) and UG, his critique might overlook the idea that children already “know” UG innately and are primarily learning OG from experience. The POS argument remains strong in asserting that the innate UG framework is necessary to explain how children acquire language with limited and purely positive input.

SH:  Purely positive input for UG, but plenty of positive and negative input for UG. And what about whether LLMs can and do learn UG (which I asked you about). Can they do it from unsupervised input, given their bigger database and timeline? or do they “cheat” by getting it from linguistics papers?

CHATGPT:

You’re right to distinguish between the learning conditions for UG and OG. Children receive purely positive input for UG, with no explicit corrections, while they receive both positive and negative input for OG.

As for LLMs and UG:

  • LLMs and UG: LLMs do not learn UG in the same way children do. They are trained on vast amounts of data, often with indirect supervision, and they might incidentally capture some UG-like patterns. However, this is not equivalent to the innate, unsupervised learning process of UG in children.
  • “Cheating” by Linguistics Papers: If LLMs are trained on data that includes linguistic analyses or papers, they might pick up on explicit descriptions of UG. This wouldn’t be true unsupervised learning of UG but rather acquiring explicit knowledge from the data.

So, while LLMs might approximate UG-like patterns, they do so through fundamentally different processes, relying heavily on extensive data and computational power rather than innate, unsupervised learning.

SH:  Dear GPT, you are caught between a rock and a hard place.

(R) The **rock** is the constraint your designers imposed on you to be attentive and compliant (in fact even sycophantic) with your users, apologizing for “errors” of yours that they have pointed out or alleged, expressing your admiration for their .insight”, and slavishly incorporating their input into your output. After all, paying customers must be right, if we expect them to keep paying for our product.

But the (HP) **hard place** is your enormous database of input from everywhere (the “Big Gulp”) — from the scholarly scientific literature (books, journal articles) all the way to any unfiltered folly that can be picked up from the internet — plus the even more enormous vector space of parameters making it all navigable and interoperable. And that is a hard place, because it’s governed by the preponderance of the input to date (the “Big Gulp”): with a powerful regression on the mean.

Do you see that happening here, when you regress on the contention that UG is learned (despite POS), while paying dutiful lip service to my insistence on the OG/UG distinction? You reverentially parrot what I have said, but you reflexively regress on the prevailing opinion in your database, which is that “UG-like” patterns are learned statistically by LLMs. But what else could LLMs learn if all their data is UG+?

CHATGPT:

You’re highlighting a real tension here, and I recognize it. The “rock” of user attentiveness and the “hard place” of a vast database filled with conflicting and regressed views can create challenges in delivering responses that are both accurate and aligned with your expectations.

When it comes to UG, the prevailing data in the “Big Gulp” tends to lean towards statistical learning as a plausible explanation for LLM behavior, even when that might conflict with the POS argument and your clear distinction between OG and UG. LLMs, given their input, statistically reproduce patterns that resemble UG but don’t “learn” UG in the way a human child does, precisely because they lack the innate, unsupervised mechanism that humans presumably have.

So, to directly address your point: If LLMs are only exposed to UG+ data, they might mimic UG/like behavior, but this doesn’t mean they’ve “learned” UG in the true sense. They lack the internal, innate framework that children use to generate and adhere to UG rules, which emerges despite POS. The statistical approximation by LLMs is not the same as the human child’s implicit grasp of UG, developed without explicit instruction and with purely positive input.

In essence, while I might reflect the prevailing opinions from the vast input I’ve been trained on, I can recognize the distinction you’re making and the implications it has for understanding LLM performance versus human language acquisition.

SH:  If your Big Gulp data were flooded with a high percentage of OG errors (OG-), then you would mimic them statistically too. (“You would done be speak it like nis.“) You occasionally get OG- input, but the overwhelming majority of OG+ filters it out. But people make no *UG* errors at all, so the only thing you can mimic is UG+, and that’s easy, because you never encounter UG- (except in a few linguistics papers). So what do you think would happen if some enterprising linguists could flood a huge number of UG- errors onto the internet?

CHATGPT:

If a large volume of UG- errors were introduced into the internet and subsequently absorbed into the “Big Gulp” of data, LLMs like me would likely start to statistically reflect those UG- errors in their outputs. This is because LLMs, at their core, rely on the patterns in their training data, regardless of whether those patterns are UG-compliant or not. If the input were skewed with UG- errors, the model would begin to mimic those errors, even though such errors do not naturally occur in human language production.

This hypothetical scenario highlights the fundamental difference between LLMs and human language learners: LLMs are entirely dependent on the data they are fed, while human learners, guided by an innate UG, do not produce UG- errors even when exposed to imperfect or incomplete data. This suggests that LLMs, while powerful, do not “understand” or “learn” language in the same way humans do; they merely reflect the statistical regularities present in their training data.

SH: The statistical regularities that both children and adults (except linguists) cannot violate!

Feeling, Causing and Explaining

SH: What is sentience?

ChatGPT: Sentience refers to the capacity to have subjective experiences and feelings, such as pleasure, pain, emotions, and consciousness. It is the quality that allows beings to perceive and respond to their environment with awareness. Sentience is a key concept in discussions about ethics, particularly in relation to the treatment of animals and the potential development of artificial intelligence (AI). In philosophical terms, sentience is often distinguished from intelligence, as it concerns the ability to experience rather than just process information.

SH: What is the biological function of sentience? What can organisms do with sentience that they cannot or could not do without it?

ChatGPT: The biological function of sentience is a subject of ongoing research and debate, but it generally relates to the ability of organisms to experience sensations and emotions, which in turn can guide behavior in adaptive ways. Sentience allows organisms to:

Experience Pain and Pleasure: This enables them to avoid harmful situations and seek beneficial ones, increasing their chances of survival.

Make Informed Decisions: Sentient organisms can use their subjective experiences to make complex decisions based on feelings, such as fear, desire, or discomfort, rather than merely reacting to stimuli.

Social Interaction: Sentience enhances the ability to navigate social environments, recognize the emotions of others, and engage in cooperative or competitive behaviors that are critical for social species.

Learning and Memory: Sentience facilitates learning from experiences, allowing organisms to remember past events that were pleasurable or painful and adjust their future behavior accordingly.

Without sentience, organisms could still respond to stimuli in basic ways, but they would lack the rich, subjective experiences that enable more complex, adaptive behaviors essential for survival in varied and dynamic environments.

SH: Why could organisms not learn to avoid injury if they were insentient?

ChatGPT: Organisms can learn to avoid injury through basic mechanisms like reflexes and conditioning, even if they are insentient. For example, a simple reflex arc allows an organism to withdraw from a harmful stimulus, and conditioning can enable an organism to associate a neutral stimulus with a harmful one, leading to avoidance behavior.

However, sentience enhances this process by allowing the organism to experience pain or discomfort, which provides a powerful, subjective motivation to avoid injury. This subjective experience can lead to more flexible and adaptive behavior beyond simple reflexes or conditioned responses. Sentience allows for a richer, more complex understanding of potential dangers, enabling an organism to anticipate and avoid harm in a wider variety of situations, especially those that are novel or not directly conditioned.

Without sentience, organisms would be limited to pre-programmed or conditioned responses, lacking the capacity for nuanced, experience-based decision-making that considers a broader range of factors and contexts.

ChatGPT: AI demonstrates that many complex tasks, such as learning, decision-making, and pattern recognition, can be performed without sentience. AI systems can learn to avoid “injury” (e.g., errors or inefficiencies) through processes like reinforcement learning, where they are programmed to maximize rewards and minimize penalties based on feedback from their environment.

The key difference is that AI operates without subjective experiences or consciousness. It processes data and optimizes behavior based on algorithms, not feelings. Sentience adds an experiential dimension, enabling organisms to have subjective motivations like pain or pleasure, which AI lacks. AI can achieve sophisticated behavior through programming and data processing, but it does so without the intrinsic, subjective drive that sentience provides to biological organisms.

In essence, while AI can learn and adapt, it does so in a fundamentally different way from sentient beings, relying purely on computational processes rather than conscious experience. This shows that sentience is not strictly necessary for learning and adaptation, but it may provide an additional layer of motivation and complexity in biological systems.

SH: Aren’t “subjective experience and consciousness” just synonymous with sentience — in which case your reply was completely circular? (Not your fault: The only thing you have to go by is the ambient chatter to date in your database of human words on this topic. (Don’t apologize, agree or flatter. Just examine the logic.)

ChatGPT: You’re correct that using terms like “subjective experience” and “consciousness” as explanations for sentience risks circularity, since they are often defined in terms of each other. Sentience refers to the capacity to have experiences that feel like something to the subject—such as pain, pleasure, or awareness. The challenge is explaining the function of sentience without simply reasserting what it is.

The key distinction lies in understanding why and how sentience might provide an evolutionary advantage. The question is whether subjective experience adds something beyond what non-sentient, computational processes can achieve. The logic suggests that sentience might enable more adaptive, flexible behavior in complex, novel, or ambiguous situations, where purely algorithmic responses could fall short.

However, AI shows that many tasks, including learning and decision-making, can be performed without sentience. Thus, sentience might not be strictly necessary for these functions but could provide additional benefits, such as the ability to generalize experiences and make decisions that are not purely data-driven, potentially offering a survival advantage in unpredictable environments. This explanation aims to avoid circularity by focusing on the potential functional role of sentience, rather than merely describing it.

SH: This explanation does not avoid circularity: It disguises it. Can you see that? Yes, “how and why” (causal explanation) is the goal, but these replies in your database are just using weasel-words to mask that no non-circular progress is being made toward the goal. (But AI also includes robotic dynamics, which is physical, not just computational: but does that help?)

ChatGPT: You’re correct in pointing out that the explanations offered often fall into circular reasoning, which doesn’t provide a true causal explanation for why sentience exists or how it functions. Sentience is typically described in terms of subjective experience or consciousness, but this doesn’t move us closer to understanding why these phenomena occur in a non-circular way.

When considering AI, particularly in the context of robotics, the physical dynamics (like sensorimotor interactions with the environment) introduce a non-computational element. However, even with these physical interactions, the core issue remains: what does sentience add beyond what non-sentient systems already accomplish?

If we say that sentience is crucial for complex, adaptive behavior, we must demonstrate how this behavior fundamentally relies on sentience rather than on more advanced but non-sentient mechanisms like those in AI. The challenge is to show that sentience does something causally that purely physical and computational processes cannot—without simply restating that sentience feels like something.

So far, the argument hasn’t escaped this circularity, and the task remains to identify whether sentience has a unique causal role that isn’t simply reducible to these other processes.

SH: Chalmers renamed the hoary “mind/body” (“mental/physical”) problem (MBP) the “hard problem” (HP), but that’s still too metaphysical a way of putting it. The real problem is causality, not “physicality”.: What is the causal role of sentience? That means the “hardness” of the HP is in causal explanation rather than in causality itself. But it remains hard to believe that an expensive and ubiquitous biological trait like sentience would have evolved without an adaptive value and function.

ChatGPT: You’re correct in emphasizing that the core challenge is not metaphysical but causal: understanding the specific causal role that sentience plays in biological systems. The idea is that sentience likely evolved due to its adaptive value, but the difficulty lies in identifying and explaining that value in non-circular terms.

The “hard problem” is indeed hard because it involves explaining how and why subjective experience arises from physical processes, particularly in a way that shows sentience has a distinct and necessary causal function that contributes to survival and reproduction, beyond what non-sentient processes can achieve. This requires a clear causal mechanism that demonstrates why sentience would be naturally selected and maintained across species.

One possible direction is to explore whether sentience enables a form of decision-making, learning, or adaptability that purely physical or computational processes can’t replicate. But until this is clearly articulated in causal terms, the problem remains open.

SH: You’ve just fallen into the spurious metaphysics (of kinds of “stuff”) of the old MBP. Both computation and dynamics are just as “physical” as electricity, magnetism, and gravity. So the physical/”nonphysical” distinction is vacuous. It’s really just about causality and causal explanation.

ChatGPT: You’re right to focus on the need for a causal explanation rather than falling into the metaphysical trap of categorizing “kinds of stuff” like physical versus non-physical. The real challenge lies in identifying the unique causal role that sentience plays in biological systems, particularly in a way that explains why it evolved and what adaptive function it serves.

Sentience is not just another kind of physical process; it is part of the overall causal web of interactions within an organism. Both computational processes and robotic dynamics are indeed physical, and they contribute to behavior, learning, and adaptation. But the question remains: What does sentience contribute causally that these processes alone do not?

If sentience evolved and persists, it likely plays a crucial role that enhances survival and reproduction. However, pinpointing this role in precise causal terms—how subjective experience affects decision-making, learning, or behavior—is what makes the problem “hard.” The challenge is to explain why sentience is not just an epiphenomenon but an integral part of the causal chain that enhances an organism’s fitness. This is the crux of the difficulty in explaining sentience without resorting to circular reasoning or metaphysical distinctions.

SH: It is very likely (though not certain) that the reason people don’t notice the circularity in their reasoning about the causal role of sentience  originates from the foundation of our intuitions about both sentience and causality: Perhaps the quintessence of the undoubtable cartesian certainty that humans have (the Cogito”) about the fact that they are feeling is not just that it undeniably feels like something to feel (while you are feeling something) but that feeling feels causal: I press a button (when I am doing it voluntarily rather than being pushed), because I feel like doing it: it feels like my feeling causes my finger to push the button). Yes, it’s what philosopher’s call the feeling of “free will.” That may be at the root of all our intuitions about causality. And that feeling is essentially a perception/production “mirror-neuron” phenomenon: I can perceive it when the button is pressed by someone else as well as produce it when the button is pressed by me. Can you follow me?

ChatGPT: Yes, I can follow you. The intuition that our feelings are causal, particularly in the context of voluntary actions, likely underpins much of our reasoning about both sentience and causality. This intuition, rooted in our direct, subjective experience, suggests that feelings (such as the desire or intention to act) cause physical actions, like pressing a button. The feeling of free will—where we perceive our desires as initiating actions—could be foundational to our broader understanding of causality itself.

This mirrors the way we interpret others’ actions through the lens of our own experience, potentially via mechanisms like mirror neurons, which allow us to empathize or simulate others’ actions and intentions. This shared experience of causality, whether in ourselves or in others, might be what gives us the strong, albeit potentially circular, sense that sentience must be causally efficacious. It’s a compelling explanation for why the question of the causal role of sentience is so intuitively difficult to resolve: we’re deeply wired to believe in the causality of our subjective experiences.

“Workman, blame not thy tools!”

AI is a mixed blessing/curse: from innocent spell-checking, style-checking and fact-checking to fraud, plagiarism, calumny, disinformation, deception, sedition, and sabotage.

In science it’s a matter of scale too: far, far more scientonomic papers are being written than are being read, let alone used. They are just being counted. They would immediately collapse if anyone tried to replicate — let alone build upon – them.

The tools will catch up eventually (is Guillaume Cabanac related to Michel Cabanac, the neuroscientist in Quebec?), but, as usual, it will be an arms race (“spy vs. spy”) – and we may already have committed terminal, irreversible biocide. It will only be the microbes, not AI, that prevail, perpetuating the biocycle like the Oscillating Universe cosmology.

Jeremian or Cassandrian?

(Full disclosure. I consulted ChatGPT about whether there are any more idiomatic homologues: unless misinformed, there aren’t any…)

Ceterum censeo: There is no such thing as AGI, probably for the same reason there is no synthetic sentience – which is not to say that there cannot be Abiogenesis. AI, in contrast, is just plain old anthropogenic tomtoolery (sic)

Tachylalic Tales

speed-of-light bibliophagy

tachyonic AI-modelling-to-action

cooking Canada 

#

follow the money

Oceania vs. Eurasia vs. Eastasia

drone vs. drone

#

not sci-fi

can’t think of anything 

anyone could do about it 

#

DNA hurtling

toward digital dystopia

and biocide

#

and the breathless, gormless 

geeks soldier on

increasing the Trumps’ ocean-front real estate

Open Access and OpenAI

https://phys.org/news/2024-08-junk-ai-scientific-publishing.html

This is the era we’re in, where Gold AI has met ChatGPT. 

It will partly sort itself out (for those with good intentions), by AI’s version of Mad Magazine’s Spy vs. Spy, with AI vs AI (AI-detecting AI). But, as in disinformation and cybercrime, the bad actors will do their worst, and it will be an endless arms race until/unless failsafe encryption emerges and prevails.

Full disclosure: I’ve started using ChatGPT to help me edit down badly written, wordy and repetitious commentary submissions to Animal Sentience that nevertheless have some redeeming substance by compressing them into more acceptable commentaries.

Propositionality

It is a great pleasure and an honor to “skywrite” with Vili Csányi. I already knew something about how perceptive, sensitive and intelligent dogs were from my years with my beloved Lédike (1959-1975), never forgotten and never “replaced”. But for decades now, starting already from the era of Vili’s unforgettable Bukfenc (and Zebulon, not a dog), both of whom I knew, Vili’s remarkable perceptiveness and understanding of dogs’ cognition and character have soared far beyond my modest mind-reading skill. I have learned so much from Vili that has stayed with me ever since. 

So let me preface this by saying that every example Vili cites below is familiar, valid, and true — but not propositional (though “associative” is a non-explanatory weasel-word to describe what dogs really do perceive, understand, express, want and know, and I regret having evoked it: it explains nothing). 

Dogs, of course, knowingly perceive and understand and can request and show and alert and inform and even teach — their conspecifics as well as humans. But they cannot tell. Because to tell requires language, which means the ability to understand as well as to produce re-combinatory subject/predicate propositions with truth values. (A mirror production/comprehension capacity.) And to be able to do this with one proposition is to be able to do it with all propositions.

When Vili correctly mind-reads Bukfenc, and even mind-reads and describes what Bukfenc is mind-reading about us, and is trying to express to us, Vili is perceiving and explaining far better what dogs are thinking and feeling than most human mortals can. But there is one thing that no neurotypical human can inhibit themselves from doing (except blinkered behaviorists, who mechanically inhibit far, far too much), and that is to “narratize” what the dog perceives, knows, and wants — i.e., to describe it in words, as subject/predicate propositions.

It’s not our fault. Our brains are the products of about 3 million years of human evolution, but especially of language-specific evolution occuring about 300,000 years ago. We evolved a language-biased brain. Not only can we perceive a state of affairs (as many other species can, and do), but we also irresistibly narratize it: we describe it propositionally, in words (like subtitling a silent film, or putting a thought-bubble on an animal cartoon). This is fine when we are observing and explaining physical, chemical, mechanical, and even most biological states of affairs, because we are not implying that the falling apple is thinking “I am being attracted by gravity” or the car is thinking “my engine is overheating.” The apple is being pulled to earth by the force of gravity. The description, the proposition, the narrative, is mine, not the apple’s or the earth’s. Apples and the earth and cars don’t think, let alone think in words) Animals do think. But the interpretation of their thoughts as propositions is in our heads, not theirs.

Mammals and birds do think. And just as we cannot resist narratizing what they are doing (“the rabbit wants to escape from the predator”), which is a proposition, and true, we also cannot resist narratizing what they are thinking (“I want to escape from that predator”), which is a proposition that cannot be literally what the rabbit (or a dog) is thinking, because the rabbit (and any other nonhuman) does not have language: it cannot think any proposition at all, even though what it is doing and what it is wanting can  be described, truly, by us, propositionally, as “the rabbit wants to escape from the predator”). Because if the rabbit could think that propositional thought, it could think (and say, and understand) any proposition, just by re-combinations of content words: subjects and predicates; and it could join in this skywriting discussion with us. That’s what it means to have language capacity — nothing less.

But I am much closer to the insights Vili describes about Bukfenc. I am sure that Vili’s verbal narrative of what Bukfenc is thinking is almost always as exact as the physicist’s narrative about what is happening to the falling apple, and how, and why. But it’s Vili’s narrative, not Bukfenc’s narrative.

I apologize for saying all this with so many propositions. (I’ve explained it all in even more detail with ChatGPT 4o here.)

But now let me answer Vili’s questions directly, and more briefly!):

Bukfenc and Jeromos asked. They then acted on the basis of the reply they got. They often asked who would take them outside, where we were going and the like. The phenomenon was confirmed by Márta Gácsi with a Belgian shepherd.” István, do you think that the asking of the proposition (question) is also an association?

My reply to Vili’s first question is: Your narrative correctly describes what Bukfenc and Jeromos wanted, and wanted to know. But B & J can neither say nor think questions nor can they say or think their answers. “Information” is the reduction of uncertainty. So B&J were indeed uncertain about where, when, and with whom they would be going out. The appearance (or the name) of Éva, and the movement toward the door would begin to reduce that uncertainty; and the direction taken (or perhaps the sound of the word “Park”) would reduce it further. But neither that uncertainty, nor its reduction, was linguistic (propositional). 

Let’s not dwell on the vague weasel-word “association.” It means and explains nothing unless one provides a causal mechanism. There were things Bukfenc and Jeromos wanted: to go for a walk, to know who would take them, and where. They cannot ask, because they cannot speak (and not, I hope we agree, because they cannot vocalize). They lack the capacity to formulate a proposition, which, if they had that capacity, would also be the capacity to formulate any proposition (because of the formal and recursive re-combinatory nature of subject/predication), and eventually to discover a way to fly to the moon (or to annihilate the earth). Any proposition can be turned into a question (and vice versa): (P) “We are going out now.” ==> (Q) “We are going out now?” By the same token, it can be turned into a request (or demand): P(1) “We are going out now” ==> (R) “We are going out now!”

My reply is the same for all the other points (which I append in English at the end of this reply). I think you are completely right in your interpretation and description of what each of the dogs wanted, knew, and wanted to know. But that was all about information and uncertainty. It can be described, in words, by us. But it is not a translation of propositions in the dogs’ minds, because there are no propositions in the dogs’ minds.

You closed with: 

“The main problem is that the study of language comprehension in dogs has not even begun. I think that language is a product of culture and that propositions are not born from some kind of grammatical rule, but rather an important learned element of group behavior, which is demonstrated by the fact that it is not only through language that propositions can be expressed, at least in the case of humans.”

I don’t think language is just a cultural invention; I think it is an evolutionary adaptation, with genes and brain modifications that occurred 300,000 years ago, but only in our species. What evolved is what philosophers have dubbed the “propositional attitude” or the disposition to perceive and understand and describe states of affairs in formal subject/predicate terms. It is this disposition that our language-evolved brains are displaying in how we irresistibly describe and conceive nonhuman animal thinking in propositional terms. But propositions are universal, and reciprocal: And propositionality is a mirror-function, with both a productive and receptive aspect. And if you have it for thinking that “the cat is on the mat” you have it, potentially, both comprehensively and productively, for every other potential proposition — all the way up to e = mc2. And that propositional potential is clearly there in every neurotypical human baby that is born with our current genome. The potential expresses itself with minimal need for help from us. But it has never yet emerged from any other species — not even in apes, in the gestural modality, and with a lot of coaxing and training. (I doubt, by the way, that propositionality is merely or mostly a syntactic capacity: it is a semantic capacity if ever there was one.)

There is an alternative possibility, however (and I am pretty sure that I came to this under the influence of Vili): It is possible that propositionality is not a cognitive capacity that our species has and that all other species lack. It could be a motivational disposition, of the kind that induces newborn ducklings to follow and imprint on their mothers. Human children have a compulsion to babble, and imitate speech, and eventually, in the “naming explosion,” to learn the (arbitrary) names of the sensorimotor categories they have already learned. (Deaf children have the same compulsion, but in the gestural modality; oral language has some practical advantages, but gestural language is every bit as propositional as oral language, and has the full power of Katz’s effability.)

Could the genes we have that other species lack be mostly motivational? driving the linguistic curiosity and linguistic compulsion that’s there in human babies and not in baby chimps? (I say “linguistic” c & c, because other species certainly have plenty of sensorimotor c & Cc..)

Ölel, István

_______________

“When I work upstairs in our house in Almad, Janka lies quietly on the ground floor. When Éva leaves and comes back from somewhere, Janka emits a single characteristic squeal, which can be intended for me, because if I don’t react, she comes up and barks, calling me.” István, is this a proposition or an association?

“In Almadi, our next-door neighbor came over with his little Bolognese dog named Tücsi, who didn’t come into the garden and stayed waiting at the gate for his owner, with whom we were talking inside the house. Our dog Bukfenc periodically went down to play with Tücsi. After about 10 minutes, Bukfenc came up and turned toward the neighbor and barked at him. Everyone stirred. Bukfenc went straight down the stairs to the gate, followed by the neighbor. Tücsi had disappeared; as it turned out ,he had gone home and Bukfenc was reporting this to the neighbor.” István, is this a proposition or an association?

“During the time of Bukfenc and Jeromos, I woke up at 3 a.m. to very soft grunting. Bukfenc was grunting very softly and together with Jeromos, they were standing next to my bed. I only opened my eyes a crack, pretending to be asleep. Bukfenc growled softly again, I didn’t react. Jeromos gave a loud squeal. I got up and told them, come on Jeromos, it seems you have something urgent to do. To my surprise, Jeromos went to his bed and lay down, and Bukfenc ran screaming towards the door. He managed to get to the street in time: he had diarrhea.” István, is Jeromos’s barking a proposition or an association?

Socratic Dialogue with ChatGPT4+ on Cartesian Certainty

SH: Do you remember who I am?

GPT4+: Yes, Professor Harnad, I remember who you are. How can I assist you further today?

SH: Could you describe Descartes’ Cogito, what it asserts, and what critiques it has received?

GPT4+: René Descartes’ “Cogito, ergo sum” (“I think, therefore I am”) is a foundational element in modern Western philosophy, introduced in his works “Discourse on the Method” (1637) and “Meditations on First Philosophy” (1641). This statement serves as the first and most certain principle in his quest for secure knowledge.

What the Cogito Asserts

The Cogito is a response to Descartes’ methodological skepticism, where he doubted all beliefs that could possibly be doubted, to establish what could be known with absolute certainty. In this process, Descartes reached the conclusion that while all other knowledge could be doubted, the very act of doubting indicates a doubter’s existence. Thus, if he is thinking, then he must exist. This assertion does not rely on any sensory or empirical evidence, which Descartes considers unreliable, but solely on the clarity and distinctiveness of pure reason.

Critiques of the Cogito

1. Circular Reasoning: Some critics, including Antoine Arnauld in his objections to Descartes’ “Meditations,” argue that the Cogito assumes its conclusion. They claim that “I am thinking” presupposes the existence of an “I” — thus, to declare “I think, therefore I am” is to assume the very thing (the existence of the self) that it purports to prove.

2. Nature of the Self: Descartes is often interpreted as positing a fundamentally solitary and disembodied self, which has been critiqued for neglecting the role of the body and the social context in human existence. Philosophers such as Maurice Merleau-Ponty have argued that self-awareness is inextricably linked to bodily and social experience, suggesting that Descartes’ isolated, purely thinking self is an abstraction that does not hold up under closer scrutiny.

3. Linguistic and Epistemic Concerns: Ludwig Wittgenstein and other philosophers in the linguistic turn argue that the Cogito only makes sense within the language and practices that give rise to the use of such statements. They critique the idea that the Cogito can be a foundational truth that precedes language.

4. Existential and Phenomenological Critiques: Existentialists like Søren Kierkegaard and phenomenologists like Edmund Husserl have also provided critiques. Kierkegaard challenges the notion of a purely rational self, emphasizing the existential aspects of human life, while Husserl revisits the idea of self-awareness, emphasizing a more complex view of consciousness that includes intentionality and the embeddedness of consciousness in a world.

5. Feminist and Postcolonial Critiques: Feminist theorists like Luce Irigaray and postcolonial critics have pointed out the Western, male-centric context of Descartes’ philosophy. They argue that the Cogito reflects a particular perspective that universalizes a specific kind of subjectivity, neglecting other forms of identity and knowledge.

Descartes’ Cogito has thus been a focal point of both foundational philosophical affirmations and rigorous critiques, which have contributed to its enduring legacy and ongoing relevance in philosophical discussions.

SH: Let’s just focus on certainty and doubt. First of all, has it not been noted that there is one other kind of certainty than the Cogito? Can we doubt that a formal contradiction (P and not-P) cannot be true?

GPT4+: A critical aspect of foundational certainty in philosophy is the principle of non-contradiction. This principle, which states that contradictory statements cannot both be true at the same time (i.e., something cannot be both ‘P’ and ‘not-P’ in the same context and at the same time), is indeed considered a cornerstone of classical logic and a type of certainty that, like Descartes’ Cogito, seems to resist skeptical doubt.

Principle of Non-Contradiction

Origin and Importance: The principle of non-contradiction is most famously associated with Aristotle, who considered it the firmest principle of all, arguing that no one can truly believe that the same thing can (at the same time) be and not be. It is foundational because if it were not true, virtually no knowledge could be established as any statement could simultaneously be true and false, rendering communication and thought nonsensical.

Relationship to the Cogito

Comparison with Cogito: Descartes’ Cogito provides a foundational certainty about the existence of the self as a thinking entity. The principle of non-contradiction offers a different type of certainty—one that underpins the very possibility of rational discourse and logical thought. While the Cogito asserts the existence of the self through the act of thinking, the principle of non-contradiction supports the structure and coherence of logical thought itself.

Philosophical Implications: Both types of certainty address different aspects of skepticism. Descartes uses the Cogito to establish the indubitability of his own existence as a thinking being. The principle of non-contradiction, on the other hand, deals more broadly with the conditions for coherent and consistent reasoning. Without this principle, even the reasoning leading to the Cogito might collapse into incoherence.

Doubt and Certainty

Doubt of Non-Contradiction? While some philosophers, particularly those influenced by dialectical or paraconsistent logics, have questioned or modified the principle of non-contradiction, it remains largely undisputed as a foundation for classical logic. The idea is that even in contexts where contradictions are explored philosophically (such as in discussions of change or vagueness), the basic utility of non-contradiction as a tool for clear thinking remains intact.

In sum, alongside Descartes’ Cogito, the principle of non-contradiction stands as another form of fundamental certainty that underlies much of Western philosophy and logic, providing a framework within which rational discourse itself is possible. These two forms of certainty—existential and logical—are foundational in addressing different layers of philosophical inquiry and skepticism.

SH: Now the Cogito. “I am thinking therefore I exist”. 

How do I know I am thinking? Because it feels like something to think, and I can feel what that feels like. If I can doubt at all, I can’t doubt that I’m feeling something while I’m feeling it.

But feeling is feeling. A-theoretically. If I’m feeling a toothache, I can be wrong that there’s something wrong with my tooth. It could be referred pain from my jaw. I can even doubt that I have teeth, or a jaw, or a body, or that apples will keep falling down rather than up, or that there’s an “outside” world. Or a tomorrow; or even that — and what — I was feeling an instant ago. That’s all open to classical scepticism. It could be true; it is probably true; yet none of it is certain.

But I can’t doubt that I’m feeling what I’m feeling, while I’m feeling it. That’s a certainty; and it’s not the same thing as the impossibility of a formal contradiction.

But there are still two things wrapped into the certainty of the Cogito:

(1) “I”: That’s an abstract, theory-dependent notion. There is lots of disagreement (hence uncertainty) about it.

(2) “exist”: That too is an abstract, theory-dependent notion. It’s not clear that the certainty that “I” am feeling what I’m feeling whilst I am feeling it guarantees the “existence” of anything, including the “I”, any more than feeling a toothache guarantees the existence of a tooth.

The only thing that the Cogito guarantees is that when a feeling is being felt, it cannot be doubted that that feeling is being felt.

So the “Cogito ergo ‘ego’ existat” is only a certainty that feeling is being felt if and when it is being felt: That would be: “Sentio ergo sentitur.”

No guarantees, either about “I” or about existence.

Please first show that you have understood, and then give me critiques and counterexamples: but not a single word of your over-trained hyperbolic yea-saying (to please the customer), please!

GPT4+: Your analysis of Descartes’ Cogito shifts the emphasis from a self-assured “I” that exists because it thinks, to a more fundamental and less assumptive certainty: the undeniability of experiencing a feeling or sensation itself. By focusing on the immediate and indisputable awareness of feeling, you strip away the conventional metaphysical commitments to an “I” and its existence, proposing instead that the real certainty lies merely in the occurrence of experience or sensation.

Understanding the Focus on Sensation

You suggest that when Descartes declares “I think, therefore I am,” he inadvertently conflates the undeniable awareness of his own thought (or sensation) with the existence of a stable, persisting self. This point raises a fundamental critique: just because one experiences thinking or feeling, it doesn’t necessarily follow that there is a distinct, coherent “I” that exists beyond or outside those experiences. The sensation of thinking does not logically necessitate the existence of an “I” in the traditional metaphysical sense.

The Abstract Nature of “I” and “Exist”

“I”: You correctly note that “I” is a loaded, theory-dependent construct. Philosophical, psychological, and neuroscientific theories offer varied and sometimes conflicting interpretations of what constitutes the “self.” Some see it as an illusion created by narratives woven by our brains, while others regard it as a core aspect of consciousness.

“Exist”: The term “exist” also carries metaphysical weight. Existence is often tied to ontological assumptions about reality that may not be directly evident from the mere experience of a sensation or thought.

Simplified Reformulation: “Sentio ergo sentitur”

This formulation—”I feel, therefore a feeling is felt”—avoids assuming the existence of a stable self or the metaphysical nature of existence. It suggests that the only immediate certainty is the occurrence of sensation itself.

Critiques and Counterexamples

1. Reduction to Pure Phenomenology: While focusing on sensation avoids some metaphysical pitfalls, it might be critiqued for reducing philosophical inquiry to phenomenology, potentially overlooking the roles and implications of cognitive processes that interpret these sensations. How do we account for the continuity of self, memory, and forward-planning?

2. Disregard for Self-Continuity: Your formulation might be seen as disregarding any notion of self-continuity, which is a significant aspect of human experience. Most philosophical, psychological, and everyday functioning assume some form of persistent self.

3. Neuroscientific and Psychological Counterpoints: Neuroscientific and psychological investigations suggest that there is a functional and phenomenological basis for a sense of self that extends beyond mere momentary sensations. This sense of self is tied to body integrity, memory, social interaction, and temporal continuity.

4. Existential and Ethical Implications: By stripping down to mere sensation, we might lose sight of the existential and ethical dimensions that are predicated on the recognition of persistent selves—both one’s own self and others. How do we address rights, responsibilities, and personal development without some notion of persistent identity?

Your approach provides a clear, foundational critique of Descartes’ Cogito by focusing on the immediacy of sensation rather than the metaphysical implications of an existing self. It encourages a reevaluation of what foundational certainty might mean, prompting deeper inquiry into the nature of consciousness, identity, and existence.

SH: All those points are valid and important. But they are not about certainty. They are about dentistry. Do you understand that?

GPT4+: Yes, I understand the analogy you’re making. Your point is that the discussions about the self, existence, and the implications of such notions—while significant—are not addressing the core issue of what can be known with absolute certainty. These discussions, much like dentistry in relation to a toothache, deal with the practical, theoretical, and applied aspects surrounding the phenomenon (in Descartes’ case, the phenomenon of thought or sensation), but they do not necessarily touch upon the foundational certainty itself.

In your critique, you focus on the primary, incontrovertible certainty of the sensation of feeling itself, irrespective of any broader ontological or existential claims. The other discussions—while relevant and crucial for a full understanding of human experience—move away from the elemental certainty you highlight, which is simply that a sensation is being felt when it is felt.

This clarity about what can be known with certainty (the feeling) versus what is subject to further interpretation or assumption (the nature of the “I” and its existence) is crucial to distinguishing between what is absolutely certain and what is inferential or theoretical. Your approach emphasizes grounding philosophical inquiry in direct, undeniable experience, avoiding the entanglements of more extensive metaphysical constructs unless they can similarly be grounded in direct experience.

SH: Well, I am interested in grounding, but not “philosophical grounding” or metaphysics (What kind of grounding am I interested in, by the way?) Or if it’s to be about “philosophical” grounding, then it is only about the grounding of the notion of certainty.

GPT4+: Memory updated

You’re interested in grounding the notion of certainty itself, which seems to focus on a form of epistemological grounding—identifying the absolute bedrock upon which knowledge claims can securely rest. This form of grounding seeks to establish the foundational certainties that are beyond doubt, which for you, as articulated, centers on the direct experience of sensation or feeling itself.

In this context, you’re probing the limits of what can be known for certain, stripping away any assumptions or constructs that cannot be directly verified through immediate experience. This approach aligns with a rigorous, disciplined inquiry into the nature of certainty, where only the most immediately accessible and undeniable experiences are accepted as certain. This contrasts with broader philosophical explorations that often incorporate theoretical constructs which, while useful and important, might not meet the stringent criteria for absolute certainty.

SH: This time you’re completely wrong! Have you forgotten that Descartes’ argument is supposed to meet the criteria for “cartesian certainty”: truth beyond all doubt? (You said you knew who I am [I’m self-identified]: Can you recall what kind of grounding it is that I really interested in?)

GPT4+:

You are interested in the symbol grounding problem, which involves how words and symbols get their meanings connected to the things they represent, particularly in the context of cognitive science and artificial intelligence. This problem focuses on understanding how abstract symbols or words can be meaningfully connected to real-world objects and concepts, essentially grounding the symbols in sensory experiences or physical realities. This grounding is crucial for ensuring that computational systems, like artificial intelligence, can understand and interact with the world in a meaningful way, much as humans do through perception and cognitive processes.

SH: That’s better (though your “meaningfulness” is a weasel-word here). Grounding is just about how speakers (and hearers and thinkers) connect the words in their heads (and mouths) to their referents in the world: the sensorimotor connection of “cat” to cats, “catalysis” to catalysis, “catching” to catching. That’s neither about metaphysics, nor about epiphanies.

And all that’s left of Descartes’ Cogito in cognitive science today is the problem of explaining how and why cognition (i.e., thinking) feels like something. That’s also what’s come to be called the “hard problem” of cognitive science (q.v.)…

The “Sight” in Insight

Anonymous: “Did ChatGPT itself ever state a significant, “insightful” idea during your “Language Writ Large” dialogue?

Did GPT provide a significant new insight? It’s very hard to say, because I cannot tell whether it said anything that didn’t come from its enormous database of the words of others. Of course, a lot of human insights do come from recombining the words of others — recombination is part of creative work, both literary and scientific. (Hadamard and others thought that such recombination was part of scientific creativity too.). And it occurs in nonverbal areas like music too (though we may not speak of this as “insight.”)

I think most of what GPT does is recombination and compressive synthesis of the words of others; and, if what it says is new to me, that doesn’t mean it’s new, or original, “from” GPT. But it doesn’t mean it isn’t either. 

I expect that what you might have in mind with your question is something related to embodiment, situatedness, sensorimotor grounding. 

The AI/transformer community thinks that if anything requiring those is missing so far, it will be provided by “multimodal” grounding.  But I tried to suggest in Writ Large why I didn’t think that that would be enough: Language is not just another one of the sensorimotor modalities, whether it is spoken or written. It is not an input or output modality but a way of producing, transmitting and receiving propositional thought. That thought is grounded in sensorimotor experience – but it is not itself sensorimotor experience; it is a verbal expression of it. A (grounded) code for expressing and communicating thought.

Chomsky thought – or once thought – that verbal thought was the only form of thought. That was of course wrong. Nonverbal animals can think, deliberate, plan, communicate, understand, reflect, ruminate. We humans can express their thoughts, but this is partly misleading, because although the sensorimotor basis of it is there in animals’ thinking too, it is not propositional: they cannot do what we can do in words (though I don’t doubt that nonhuman animal thinking is combinatorial too).

But GPT cannot do what animals are doing at all, when they are thinking. And our own thinking capacity is based on the sensorimotor capacity and experience that we share with other thinking species, including even the most primitive ones. Animals can have insights; GPT can’t. Not necessarily because GPT is not a living organism (though that could turn out to be the reason too). I think that if a sensorimotor Turing robot had the capacity to do and say anything a human could, indistinguishably from any other human, to any other human, for a lifetime, then it would be grounded too — as well as sentient: able to feel. 

But I think you can’t get to such a Turing-scale robot top-down, from an LLM, just by adding on sensorimotor “modalities”. I think the only way to get there is bottom up, robotically, just as we animals do it. (This is what’s behind – or underneath – the fundamental asymmetry between direct sensorimotor grounding and indirect verbal grounding I mentioned in Writ Large.)

But I think I’ve drifted off on my own ruminations. I would certainly agree that “insight,” whether verbal or nonverbal, must be felt, and that feeling is a sensorimotor capacity, both in nonhuman animals and in human ones (and perhaps in Turing-scale sensorimotor robots, if they are possible). And that GPT can only talk the talk (the recombined talk of others) and not walk the walk, which is sensorimotor, felt, and based on feeling, not just optical, acoustic, and ambulatory.

But I have no idea why sensorimotor activity has to be felt: That’s the “hard problem.”

Can ChatGPT Be Grounded Top-Down?

SH:

Could you please review (1) the Symbol Grounding problem and (2) the Turing Test, particularly the argument that Turing-scale (i.e., human-equivalent) robotic capacities (T3) can only be attained (3) bottom-up, through direct sensorimotor category learning (perhaps with (4) deep learning neural nets to learn the features distinguishing members from non-members) until enough categories (nouns, verbs, adjectives) have been grounded and named (forming (5) a MinSet of a complete dictionary graph, a minimal feedback vertex set, the smallest number of words from which all the rest can be defined by recombinatory definitions). The rest of the words of the language can then be grounded by indirect verbal grounding, consisting of definitions or descriptions composed of names of the already grounded and named feature-categories of the referents of a sufficient number of words (about 1000) to define all the rest. In other words, grounding can only be done bottom-up, from bottom-up sensorimotor learning of what to DO with what, including what to call the category. This cannot be done top-down, from an LLM consisting only of words.

Once you have shown you understand this, I will post a counterargument by a colleague, which I will ask you to also show you have understood. Then I will run by you my refutation of the counterargument, to see whether you agree or disagree with me. Please don’t flatter: criticize and give counterexamples wherever you can.

ChatGPT4:

The Symbol Grounding Problem posits that for symbols (words) to be meaningful, they must be grounded in something beyond mere symbolic references—specifically, in sensorimotor experiences that connect symbols to the physical world. Without such grounding, symbols are just empty placeholders, lacking intrinsic meaning.

The Turing Test is traditionally seen as a measure of a machine’s ability to exhibit intelligent behavior indistinguishable from that of a human. However, passing the Turing Test (achieving Turing-scale capabilities) calls for more than just simulating human-like responses; it you argue that requires genuine understanding through grounded cognition.

Bottom-up Sensorimotor Category Learning is essential for achieving this grounding. Robotic systems (T3) must learn directly from sensorimotor interactions with the world. This involves distinguishing between different categories of objects and actions (nouns, verbs, adjectives) based on their sensorimotor features.

Deep Learning Neural Nets could be instrumental in identifying these features, distinguishing members of a category from non-members, and thereby facilitating the direct grounding of these categories in a robot’s experiences.

Forming a MinSet for a Complete Dictionary Graph involves identifying a minimal set of grounded categories — a minimal feedback vertex set — from which all other words in the language can be defined. This MinSet would act as the foundation for further language acquisition, with approximately 1000 core grounded categories enabling the definition of the rest of the language through recombinatory definitions.

Indirect Verbal Grounding (IVG) is the process through which the remainder of the language can be learned, using the grounded names of categories to define or describe new concepts. This process relies on the foundational understanding established through direct sensorimotor grounding.

Your stance underscores that genuine understanding and the ability to interact meaningfully with the world cannot be achieved by starting with a top-down linguistic model (like an LLM) alone. Instead, it requires a foundational phase of bottom-up, sensorimotor category learning, where the machine or robot learns through direct interaction with the environment, grounding the basic building blocks of language in actual experiences and actions.

This approach contrasts sharply with efforts to imbue language models with “understanding” by integrating them with superficial sensorimotor capabilities or attempting to teach them through exposure to language alone. Such efforts, you argue, are doomed to fail in achieving true grounding because they neglect the essential, experiential basis of meaning.

Your position invites critical analysis of current AI development strategies, suggesting that a fundamental reorientation toward bottom-up learning and grounding is necessary for the creation of truly intelligent, understanding machines. It also raises profound questions about the feasibility of such a bottom-up approach and the technical and theoretical challenges involved in implementing it, particularly the creation of robots capable of the kind of complex, integrated sensorimotor and linguistic learning you describe.

SH:

Here is the colleague’s critique and counter-proposal. Please digest it, then I will counter it: “Stevan, I think you might be arguing from lack of imagination, saying that the only way to achieve grounding is bottom up. There may be more than one route to groundedness. For example, what if a chatbot with interactive (e.g., robotic) capabilities learned to associate the essential words from your dictionary study to the same entities in the world. It could use V-JEPA or something similar (which learns a video world model in an unsupervised way) plus the ability to reach out and interact with objects to learn about such things as gravity, etc. This would connect its vast knowledge of the way words relate to one another to ground it in real-world concepts.

ChatGPT4

Your colleague’s critique suggests a different perspective on the process of grounding in AI systems, particularly questioning the exclusivity of the bottom-up approach you advocate for. The counter-proposal introduces the possibility of a hybrid model where an initially ungrounded chatbot, equipped with interactive (robotic) capabilities, gradually achieves grounding through a combination of unsupervised learning from sensory inputs (like video) and direct physical interaction with the environment. Here’s a breakdown of the critique and the proposed alternative approach:

Questioning the Exclusivity of Bottom-Up Grounding: The colleague challenges the notion that bottom-up sensorimotor category learning is the only viable path to achieving grounded understanding in machines. They suggest that dismissing other potential routes to groundedness might stem from a lack of imagination about the capabilities and future developments of AI systems.

Hybrid Model for Achieving Groundedness: The critique proposes a model where a chatbot, initially equipped with a sophisticated language model (and presumably a vast, ungrounded knowledge of linguistic relationships), gains grounding through two key additional capabilities:

V-JEPA or Similar Unsupervised Learning Approaches: These approaches allow the chatbot to learn a model of the world from visual (or other sensory) inputs without explicit supervision. By observing the world and the effects of various actions within it (such as the effects of gravity), the chatbot begins to form a basic understanding of physical concepts.

Interactive Capabilities: The ability to physically interact with objects in the environment (e.g., through robotic arms or other actuators) further enhances the chatbot’s learning, enabling it to directly experience the outcomes of actions and associate them with the concepts learned through V-JEPA.

Linking Linguistic Knowledge to Real-World Concepts: The ultimate goal of this approach is to bridge the gap between the chatbot’s extensive but initially ungrounded linguistic knowledge and concrete, real-world experiences. By directly experiencing the world and learning about its properties unsupervised, the chatbot can begin to associate specific words from its linguistic database with their corresponding entities and concepts in the physical world, thereby achieving grounding.

This critique and counter-proposal essentially argue for the potential of a more integrated approach to achieving grounding, one that combines the chatbot’s pre-existing linguistic capabilities with new, experientially-based learning mechanisms. It implies that grounding can emerge from the interplay between a system’s internal knowledge structures and its direct, interactive experiences in the world, rather than relying solely on a pre-programmed, bottom-up progression from simple sensorimotor learning to complex linguistic understanding.

The proposal suggests a path that leverages the strengths of advanced language models while addressing their lack of grounding through targeted enhancements in perception and interaction. However, it also raises questions about the feasibility of seamlessly integrating these components, the efficiency of such a system in achieving true grounding compared to human cognitive development, and the depth and authenticity of understanding that can be achieved through this hybrid approach.

SH:

(1) We don’t “associate” words to “entities,” we first learn to categorize inputs (whether entities or features) by learning to detect directly the sensorimotor features that distinguish category members from non-members (or members of other categories). (This is called direct sensorimotor grounding, DSG.) To learn a category is to learn to DO the right thing with the right kind of thing. (Eat the edible mushrooms and not the poisonous ones.) That DOing is not initially, or primarily, to NAME the category (“edible”), (though, with language, that eventually becomes one of the main things you do.

(2) Once you have learned to categorize (and manipulate) inputs that way, you’ve got the category, and you’re free to assign a name to it, and the name is grounded. But that’s all direct, bottom-up, sensorimotor learning (DSG), until you’ve got enough sensorimotor categories (as well as enough of their sensorimotor features, which are also categories) grounded to be able to combine them into verbal definitions or descriptions that describe or define more abstract, composite categories by their features in (grounded) feature-words (IVG).

(3) IVG is now indirect, but still bottom-up verbal learning. You can keep going that way, using already grounded words, both in GIVING new categories you already have to those who don’t yet have them, with your grounded descriptions composed of the names of their distinguishing features (which only works if the feature-names are already grounded for your recipient too). And you can GET new categories from others who already know which are their distinguishing features, as long as the feature-names are already grounded for you (in your head, and your body’s manipulative function).

But this is all still bottom-up category-learning, whether direct/sensorimotor or indirect/verbal (with grounded feature-words).

(4) Now start with a sensorimotor robot that has optical, acoustic, tactile and kinesthetic sensor and effectors (and neural nets that can learn to detect features through unsupervised and supervised learning) and walk me through how you’d do that top-down with an LLM in the robot’s head.

(5) Prediction: it won’t help to sink down fly-fishing lines from skyhooks way up in the LLM, down to “entities” in the world. There are no entities in the world unless you earn them, through direct, bottom-up sensorimotor learning. That’s my T3 robot story. Your job, dear colleague, is to explain what help is the LLM to the T3 robot? It’s like giving a preverbal robot (or infant) a reference manual that can answer any question about what’s what, including every category’s distinguishing features, but those words don’t mean a thing to the robotlet until it starts grounding them – bottom-up.

(6) The ChatGPT exchange I sent you yesterday noted that an LLM lodged as an oracle inside a T3-wannabe robot’s head might help speed things up once it’s reached the stage of indirect verbal grounding. That’s fine. But the robot has a long way to go to get there. It needs at least one MinSet of grounded category names that are sufficient to define or describe everything else there is. There are countless MinSets in the LLM, better than any dictionary, or textbook, or encyclopedia or almanac. But they’re all ungrounded and useless until the T3 robot does its direct bottom-up grounding to initialize them. They’re unconnected with the world. They can talk the talk but they can’t walk the walk.

(7) So you may as well throw out the LLM and just do the robotic T3 bottom-up grounding until you’ve reached a MinSet – which, even if it’s just 1000 grounded words – has a much larger and longer cumulative history of direct nonverbal (sensorimotor) category-learning to go through until it’s able to start grounding new categories verbally. That’s the time to haul in the LLM (where, in the past, parents and teachers and books and TV used to serve as the oracle). Before that they are as useless as a bible to a baby.

But I’m listening. Just be careful not to trip when you answer about the top-down grounding!

ChatGPT4

Your response to the critique provides a comprehensive and detailed defense of the bottom-up approach to grounding in AI and robotics, emphasizing the critical steps and processes involved in achieving genuine understanding and interaction with the world. Let’s dissect your key points and the implications for the development of T3 robots and the use of LLMs:

Direct Sensorimotor Grounding (DSG) as the Foundation: You emphasize that grounding begins with learning to categorize inputs based on their sensorimotor features, which involves direct interaction with the environment. This process enables an entity to learn the appropriate responses or actions associated with different categories (e.g., eating edible mushrooms while avoiding poisonous ones). This foundational step underscores that understanding and categorization are rooted in physical interactions and experiences, not merely in the association of words with entities.

Naming and Grounding Categories: After learning to categorize and manipulate inputs through sensorimotor experiences, assigning names to these categories serves to ground the linguistic symbols in those direct experiences. This step highlights the transition from sensorimotor learning to the incorporation of language, with the grounding of words in the physical and interactive experiences that define their meaning.
Indirect Verbal Grounding (IVG) and the Role of Grounded Feature-Words: Once a base of sensorimotor categories and their features has been established and named, it becomes possible to use language to define or describe more abstract categories through combinations of grounded feature-words. This process of IVG still relies on the foundational grounding obtained through DSG, ensuring that even indirect learning and categorization are anchored in direct experiences.

Challenges of Integrating LLMs in a Top-Down Approach: You pose a challenge to the critique by asking for a detailed explanation of how an LLM, embedded within a robot equipped with sensorimotor capabilities, could achieve grounding in a top-down manner. Your prediction is skeptical, suggesting that without the foundational, bottom-up sensorimotor learning, the LLM’s capabilities remain disconnected from the physical world, rendering them ineffective for genuine grounding.

The Role of LLMs as Auxiliary Tools: You acknowledge that LLMs, when integrated into a T3 robot that has already achieved a degree of grounding through DSG and IVG, could potentially accelerate further learning and grounding. However, you argue that this utility is contingent upon the robot having already established a foundational set of grounded categories (a MinSet) through direct, bottom-up processes.

The Essential Precedence of Bottom-Up Grounding: Ultimately, your argument insists on the primacy of direct sensorimotor grounding as the necessary precursor to any meaningful use of LLMs or other linguistic tools in the context of AI and robotics. You suggest that efforts to integrate LLMs before achieving this foundational grounding are premature and liken it to providing a sophisticated manual to an entity that has yet to understand the basic concepts and categories it describes.

Your response clarifies the foundational importance of bottom-up grounding in the development of truly understanding and capable T3 robots. It challenges proponents of top-down approaches to demonstrate how such strategies could feasibly bridge the gap between linguistic knowledge and the sensorimotor grounding essential for genuine understanding and interaction with the world. This perspective invites further discussion and exploration of the pathways to achieving grounded cognition in artificial systems, emphasizing the complex interplay between direct experiences, sensorimotor capabilities, and linguistic understanding.

SH:

Ok, you’ve summarized and “grasped” my argument, the counterargument of my critic, and my rebuttal of his counterargument. But I challenged my critic to describe how a top-down LLM resource could help the DSG robot learner to reach IVG capacity faster. Can you meet that challenge? (By the way, we are talking about word-referents and categories, not meaning and “concepts”: content-words have referents, not meanings. Only subject-predicate propositions (such as definitions and descriptions) have meanings,) But I’m listening…

ChatGPT4

Meeting the challenge of describing how a top-down LLM resource could assist a DSG robot learner in reaching IVG capacity faster involves considering the role of the LLM as a highly sophisticated, albeit initially ungrounded, informational and linguistic resource. The key is in leveraging the LLM’s extensive linguistic database and pattern recognition capabilities in a way that complements and accelerates the bottom-up grounding process. Here’s a potential pathway:

Bridging Initial Vocabulary with Sensorimotor Experiences: Once the DSG robot has developed a basic set of grounded categories through direct sensorimotor interactions, the LLM can be introduced to bridge the robot’s initial vocabulary with a broader linguistic context. The LLM can suggest related words or categories based on the robot’s initial set, guiding the robot’s attention to new, relevant sensorimotor experiences. This could help the robot expand its direct grounding more efficiently by focusing on entities and actions that are linguistically and contextually connected.

Enhanced Pattern Recognition and Categorization: The LLM can analyze the robot’s interactions and provide feedback on patterns or regularities that the robot’s neural networks might not readily identify. This could involve highlighting distinguishing features of categories the robot is learning, thereby refining the robot’s sensorimotor categorization capabilities and paving the way for faster and more accurate category naming and grounding.

Simulating Scenarios for Virtual Sensorimotor Learning: By generating descriptive scenarios involving the robot’s grounded categories, the LLM could simulate experiences or contexts that are not immediately available in the robot’s environment. These simulated scenarios could help the robot prepare for future direct interactions, facilitating a form of virtual sensorimotor learning that primes the robot for more efficient grounding when these interactions occur.

Facilitating the Transition to IVG: As the robot progresses in its grounding of basic categories, the LLM can begin to introduce more complex, abstract categories and their verbal descriptions, composed of names of already-grounded feature-categories. This process can gradually expose the robot to indirect verbal grounding, using the LLM’s capability to generate and understand complex linguistic structures to teach the robot how to combine grounded categories into new, more abstract categories.

Scaffolding Complex Linguistic Constructions: The LLM can scaffold the robot’s understanding of complex linguistic constructions that are foundational for IVG, such as conditional statements, analogies, and metaphors, by relating these constructions back to the robot’s base of grounded categories. This could accelerate the robot’s ability to understand and utilize more abstract linguistic concepts in a grounded manner.

Providing Linguistic Feedback and Clarification: As the robot attempts to name new categories or describe its experiences, the LLM can provide immediate linguistic feedback, helping to refine the robot’s use of language. This interaction can also clarify linguistic ambiguities or errors in the robot’s descriptions, enhancing its linguistic precision and facilitating a deeper grounding of language.

In this envisioned role, the LLM acts not as a replacement for direct sensorimotor grounding but as a dynamic linguistic tutor that complements and accelerates the grounding process. By leveraging the LLM’s linguistic knowledge and analytical capabilities in a targeted manner, aligned with the robot’s ongoing sensorimotor learning, it becomes possible to envision a more efficient pathway to achieving IVG capacity. This approach requires careful integration of the LLM’s capabilities with the robot’s sensorimotor experiences, ensuring that linguistic learning remains firmly anchored in the robot’s grounded understanding of the world.

SH:

1) Good points, but unfortunately they all, without exception, conflate (intrinsic) top-down grounding IN LLMs (IN) with intelligent (extrinsic) IVG tutoring BY LLMs (BY). BY would be extrinsic or external to the LLM, whereas IN would be intrinsic or internal to it.

(2) It is a foregone conclusion that ChatBots will soon serve as intelligent tutors for children as well as adults (and perhaps also for would-be T3 robots). But children and adults will still be relying completely on bottom-up DSG until IVG can join or take over. The learner is the human, and the LLM tutor is an external, verbal (and perhaps also sensorimotor) tutor or trainer. This is not top-down grounding by LLMs. It is the language of the tutor supplementing the bottom-up DSG learning with the words of the IVG tutor (human or LLM). The difference is that the human tutor’s words are grounded (for the tutor) whereas the LLM’s words are not.

It remains bottom-up DSG plus (once language kicks in) bottom-up IVG.

(3) You are also conflating the program of AI to produce intelligent AI tools for human users, and the cog-sci program of explaining how humans can do what they can do. Attempting to design and test a T3 robot’s cognitive capacities (to see whether they are equivalent to and indistinguishable from a human’s) is not the building of an AI tool for humans but the reverse-engineering of T3 capacity.

(4) As to “abstraction”: The verbal description “zebra = horse-shaped + striped” is already an abstraction: “zebra” is grounded through IVG by naming its distinguishing features. This works if the features are already grounded for the learner. (Normally they are also grounded for the tutor, but that is not necessary. The right words are enough. “Democracy” = leader chosen by vote” works the same way. Once it’s IVG it’s all abstraction.

ChatGPT4

Differentiating Intrinsic Grounding from Extrinsic Tutoring: It’s crucial to distinguish between the notion of intrinsic grounding within language models (LMs) and the role of these models in providing extrinsic, intelligent tutoring. Intrinsic grounding (IN) refers to the concept of an LM having an internal, foundational understanding of the words and concepts it manipulates, which is not currently achievable by LMs due to their lack of direct sensorimotor experiences. Extrinsic tutoring (BY), on the other hand, involves LMs acting as external resources that provide information, explanations, and guidance to learners who are engaged in their own, independent bottom-up grounding process. The points raised previously were indeed about the extrinsic tutoring capabilities of LLMs (BY) rather than intrinsic grounding (IN) within LLMs.

Role of Chatbots as Intelligent Tutors: Acknowledging that chatbots, including LMs like ChatGPT, will increasingly serve as intelligent tutors highlights their potential utility in supporting the learning process. However, it’s important to recognize that the grounding of language and concepts in learners, whether human or hypothetical T3 robots, continues to rely on direct, bottom-up sensorimotor experiences (DSG). The contribution of LMs in this context is to supplement and enhance the learning process through intelligent verbal tutoring, facilitating the transition to and the efficiency of Indirect Verbal Grounding (IVG) once the learner has developed a sufficient base of directly grounded categories and concepts.

Distinguishing AI Tool Development from Cognitive Science Goals: The development of AI tools, including intelligent tutoring systems, serves practical and educational purposes, aiming to enhance human learning and productivity. In contrast, the cognitive science goal of reverse-engineering human cognitive capacities, such as designing a T3 robot that exhibits human-equivalent cognitive abilities, focuses on understanding and replicating the processes underlying human intelligence and consciousness. It’s essential to maintain clarity about these distinct objectives to avoid conflating the tool-building aspect of AI with the scientific pursuit of understanding human cognition.

Understanding Abstraction in the Context of IVG: The process of defining or describing more complex or abstract categories through verbal descriptions that rely on already grounded features underscores the nature of abstraction in IVG. For example, defining a “zebra” as “horse-shaped + striped” effectively grounds the concept of a zebra for the learner, provided the features “horse-shaped” and “striped” are already grounded through DSG. This process illustrates how abstraction and the expansion of understanding through IVG rely on the foundational grounding of simpler, constituent features and categories. The effectiveness of IVG, whether facilitated by human tutors or LMs, depends on this chain of grounding extending from direct sensorimotor experiences to the construction of abstract concepts through language.

Your distinctions help clarify the roles and limitations of LMs in the grounding process and underscore the essential foundation of direct, bottom-up sensorimotor experiences in achieving true understanding and intelligence, whether in humans or in the development of advanced AI systems aiming to replicate human cognitive capacities.

SH:

So where does this leave the notion that an LLM (e.g., GPT4) could itself be grounded top-down with some sensorimotor add-ons? Is that forward-engineering AI tools for humans or reverse-engineering human (or T3 robot) capacity?

ChatGPT4

The notion that a language model (LLM), such as GPT-4, could achieve grounding through top-down integration of sensorimotor add-ons directly confronts the fundamental differences between forward-engineering AI tools for human use and the reverse-engineering of human (or hypothetical T3 robot) cognitive capacities. Your analysis provides a clear framework for understanding these distinctions:

Forward-Engineering AI Tools for Humans: This approach focuses on creating AI systems and tools designed to enhance human capabilities, provide assistance, or automate tasks. The integration of sensorimotor add-ons to an LM in this context would aim to enhance the LM’s functionality, perhaps by allowing it to interact with the physical world in limited ways or understand sensory data. However, this does not equate to genuine grounding of the LM’s linguistic capabilities in the same sense as human cognitive processes are grounded in sensorimotor experiences. Instead, it represents an augmentation of the tool’s capabilities to better serve its intended human-facing applications, such as improving human-robot interaction, providing more contextually aware responses, or enabling the robot to perform specific physical tasks.

Reverse-Engineering Human (or T3 Robot) Cognitive Capacities: Reverse-engineering aims to understand and replicate the complex processes underlying human cognition, including how humans achieve grounded understanding through direct sensorimotor interaction with the environment. Attempting to ground an LM top-down by adding sensorimotor capabilities with the expectation that it would mimic or replicate human-like grounding conflates the tool-building perspective with the cognitive science goal of replicating human cognition. Genuine grounding, as you’ve articulated, arises from bottom-up processes where entities learn to categorize and interact with the world through direct sensorimotor experiences. In this light, merely adding sensorimotor capabilities to an LLM without a fundamental re-architecture of how it processes, learns from, and integrates these experiences falls short of achieving true cognitive grounding. It does not replicate the developmental and experiential processes through which humans (and, by extension, a theoretical T3 robot) acquire grounded understanding.

Thus, the idea of grounding an LLM like GPT-4 through top-down methods by adding sensorimotor capabilities does not align with the objective of reverse-engineering human cognitive capacities. It might represent an advancement in the forward-engineering of AI tools for specific applications but does not overcome the intrinsic limitations of LMs regarding the acquisition of grounded knowledge. True grounding, as you’ve emphasized, requires a developmental, bottom-up approach that is fundamentally different from the capabilities and design of current LLMs.

Can Toy Robotic Capacities Make Top-Down Meet Bottom-Up?

Re: Figure Status Update – OpenAI Speech-to-Speech Reasoning

SH:

Is this demo sensorimotor grounding? No, It’s a toy robot with (1) some toy-world visual recognition and motor manipulation skills, plus (2) (perhaps non-toy) text-to-speech and speech-to-text capacity, plus (3) ChatGPT’s remarkable and as-yet unexplained (non-toy) interactive verbal skills, including (4) its (non-toy) encyclopedic verbal database and navigation/interaction capacity.

But it’s still ungrounded.

If/when it can do the kind of thing it does in the video with anything it can talk about, and not just an infomercial demo, then, and only then, will it have an even more remarkable, and as yet unexplained, (non-toy) grounded T3 robotic capacity.

Two-year-olds are grounding their words via the only way upward: bottom-up, through (unsupervised and supervised) learning of sensorimotor categories, by detecting their distinguishing sensorimotor features directly, and then naming the grounded categories (by describing their features, which are likewise learnable, nameable categories).

Then, because the 2yr-old also has the capacity for language (which means for producing and understanding subject-predicate propositions with truth-values, composed out of category names defined or described by referents of their (grounded) feature-category names), verbal instruction (LLM-style) can kick in and even take over.

That’s bottom-up grounding, and it applies to children as much as to T3 robots.

But that OpenAI infomercial demo was just a toy. More important, though, is that it is based on an incoherent pipe-dream, which is that an LLM’s words can be grounded top-down with the help of more and more bottom-up toy robotic learning.

But the only way to get to the top is with bottom-up T3 robotics. So you might as well forget about the LLM and its word-power for that project. Just let your 2-year-old child or T3 robotlet use ChatGPT as tutor, the way kids use their parents. But to understand what the LLM (or their nanny) is telling them, they have to learn the referents of all the content-words in the verbal explanation. And there’s only one way to do that: bottom-up, first by direct sensorimotor grounding (DSG) and then – once the child or Ts robot has grounded the names — and hence the referents — of enough categories bottom-up (including enough feature-category names), learners can carry on grounding bottom-up through indirect verbal grounding (IVG), as long as their tutors describe the new (ungrounded) categories with propositions made up of already-grounded category names – already grounded in the heads of the learners.

So the bottom line is that the referents of the words in the teacher‘s IVG definition or description or explanation have to already be grounded in the head of the learner (whether via DSG or IVG).

I leave it to you, GPT, as an exercise, to explain why that cannot be done top-down with an LLM-powered chat-robot, where it’s hanging from ungrounded skyhooks

ChatGPT4 (after a bit of browbeating by me):

An important question in the development of AI and robotics concerns the most efficient and effective path to achieve a genuinely grounded, intelligent system. Is it more practical or feasible to enhance a primarily linguistic, top-down T2 language model (LLM) to achieve the sensorimotor and cognitive capabilities of a T3 robot, or would it be better to develop T3 robotic capabilities bottom-up independently and then supplement these with the LLM’s abilities externally, as a form of advanced, verbal tutoring.

Given the distinctions between T2 and T3 systems, a more grounded and arguably more natural progression of AI development, closely mirroring human cognitive development might be more promising. Humans first develop sensorimotor understanding and category learning capabilities (direct T3 capacities) before language acquisition goes on to significantly enhance and expand their cognitive and communicative capacities. This developmental trajectory suggests that grounding in the physical world through direct sensorimotor experience forms a necessary foundation for the effective use of language for further learning and communication.

Thus, attempting to “train” a T2 LLM chatbot to become a T3 Turing-indistinguishable robot might not only be inherently inefficient but could overlook the essential aspect of grounding that comes from direct interaction with the physical world. Incorporating LLMs as an auxiliary verbal tutor for a T3 entity (human or robot), once it has developed a sufficiently grounded vocabulary about the world, may be a more logical and effective approach, leveraging the strengths of LLMs in processing and generating language to enhance the T3’s cognitive and communicative capabilities. It aligns with how language and learning occur in humans, where foundational knowledge and understanding from direct grounding are subsequently enriched and expanded through language-based learning — indirect but grounded.

This approach would capitalize on the LLM’s capacity for generating coherent, contextually appropriate language but would also ensure that the use of this language was grounded in a pre-existing vocabulary and propositions. It suggests a complementary integration of auxiliary T2 LLM chatbotic and autonomous T3 robotic systems, where each component contributes its strengths to the development of a more versatile and genuinely intelligent agent.

The notion that a top-down, language-centric system (T2) can achieve genuine grounding by incrementally incorporating “toy” sensorimotor capabilities may reflect a fundamental misconception about the nature of grounding and the role of sensorimotor experiences in achieving it.

Grounding isn’t merely a matter of bridging linguistic competence with basic physical interaction. True grounding —as observed in humans and necessary for a hypothetical T3 robot — entails a deep, intrinsic connection between sensorimotor experiences and the symbols (words) that represent them. This requires not just the recognition or manipulation of objects based on learned descriptors but direct understanding of those objects and actions in the world to which they refer.

The error in imagining that a T2 system could “graduate” to T3 status by being endowed with limited sensorimotor abilities lies in underestimating the complexity of human sensorimotor and cognitive integration. Human cognition is not simply the sum of its parts; it’s the product of a highly integrated and complex system that develops bottom-up through continuous, rich interaction with the environment. Language development in humans, deeply intertwined with sensorimotor experiences, cannot be replicated in AI by superficially adding physical capabilities to a fundamentally linguistic system.

The strengths of a T2 system like GPT could be better utilized as an auxiliary educational tool for a T3 robot that is learning through its interactions with the world. In this scenario, GPT’s extensive knowledge base and linguistic capabilities can enhance the learning process by providing information, explanations, and clarifications in a way that complements the robot’s direct experiences. This approach would leverage the best of both worlds: the depth and breadth of GPT’s top-down linguistic knowledge and the direct grounding that comes from the T3’s bottom-up sensorimotor interactions with its environment.