Turing Test – Skywritings

9th June 2026

A mortal, brainstorming with two superhuman LLMs

SH: Can you do me a summary of Ted Chiang on AI in Atlantic Monthly?

GPT: Professor Harnad, If you mean Ted Chiang’s recent Atlantic essay, Ted Chiang, “No, Artificial Intelligence Is Not Conscious” (June 2026), the argument is essentially an attack on the tendency to infer sentience from linguistic performance.

In very compressed form:

1. Behavioral competence is not evidence of feeling. Chiang argues that because LLMs can converse fluently, many people are tempted to attribute consciousness to them. He regards this as a mistake rooted in anthropomorphism. The ability to generate plausible linguistic outputs does not imply that there is “something it feels like” to be the system.

2. We confuse simulation with the thing simulated. Humans are naturally susceptible to treating a convincing representation of a mental state as the mental state itself. LLMs are especially effective at producing such representations.

3. Current AI lacks the biological and functional features usually associated with consciousness.

Chiang does not claim to have solved the problem of consciousness. Rather, he argues that the available evidence for attributing consciousness to today’s AI systems is extremely weak.

4. The burden of proof lies with those claiming AI is conscious. Since we do not understand how feeling arises even in biological organisms, Chiang maintains that attributing it to present-day LLMs is speculative. Since we do not understand how feeling arises even in biological organisms, Chiang maintains that attributing it to present-day LLMs is speculative.

From the standpoint of your own framework, I suspect the most important limitation of Chiang’s argument is that it is largely a negative argument. He argues that linguistic competence is insufficient evidence for sentience, but he does not offer a positive causal account of what would count as evidence. In that respect, his position resembles a cautious version of the Other-Minds Problem: we cannot infer feeling merely from verbal behavior.

Where your position diverges is that you separate at least three issues that are often conflated:

T2 verbal performance (the original Turing Test: lifelong verbal interaction performance capacity, texting, indistinguishable from real human verbal interaction),

T3 verbal plus sensorimotor performance capacity (lifelong T2 performance capacity plus lifelong robotic capacity to interact with the referents of the words in the world

and the Hard Problem of explaining how and why any of it should feel.

Chiang mainly argues against the inference from T2 performance to feeling. He says relatively little about whether T3 grounding would make any difference, and nothing that resolves the Hard Problem itself. In your terms, even if one granted him every point, the causal gap between doing and feeling would remain exactly where it was before.

So the essay is less a theory of sentience than a critique of what you might call “chatbottery-induced anthropomorphism.”

SH: Chiang is right, of course. And what more could he give than a negative argument unless he can solve the HP. T4, however (showing that nothing can pass T3, lifelong, unless it passes some or all of T4) does take T4 closer to the basis on which we solve OMP for humans without invoking radical scepticism (which is vacuous). Think about it…

GPT: Yes. The point is subtler than merely saying that T4 is “more human-like.”

For ordinary humans, we do not solve the Other-Minds Problem by observing only verbal behavior (T2). Nor do we solve it solely by observing overt sensorimotor behavior (T3). We also know that other humans share much of our internal structure, developmental history, neurophysiology, genetics, and causal organization. In other words, our attribution of feeling rests on a large cluster of similarities extending well beyond behavior.

That does not solve OMP in the logical sense. Nothing does. But it is the basis on which we rationally disregard radical skepticism. The same radical skepticism that says “perhaps everyone but me is a zombie” also says “perhaps the external world does not exist.” Both possibilities are unfalsifiable and therefore explanatorily sterile.

This is where T4 becomes important. A T4-passing system would not merely behave like us. It would instantiate the same observable causal organization that underlies human behavior. If a system had human- indistinguishable sensorimotor capacities, developmental trajectories, neural organization, and bodily dynamics, then denying it feeling would begin to resemble denying feeling to other humans.

Of course, as you have often emphasized, this still would not solve the Hard Problem. Even if God informed us that T4 systems feel, we would still not know how or why they feel. The explanatory gap would remain untouched.

So T4 narrows the OMP gap without narrowing the HP gap.

One way to put it is:

T2 gives us verbal indistinguishability.

T3 gives us behavioral and sensorimotor indistinguishability.

T4 gives us causal indistinguishability.

None of them gives us an explanation of feeling.

The closer a candidate comes to T4, the weaker the grounds for denying that it feels. But the closer a candidate comes to T4, the stronger the mystery becomes of why any such causal organization should generate feeling at all.

That is why OMP and HP are distinct. T4 helps with the former. It leaves the latter exactly where it was. Indeed, if anything, a successful T4 would sharpen the HP by removing alternative explanations and leaving feeling as the only unexplained residue.

SH: Tell Ted Chiang about mirror neurons, if he doesn’t already know. It’s the most likely basis for both “mind-reading” and anthropomorphism…

GPT: If I were formulating that point for Chiang, I would put it cautiously, because the evidence for mirror neurons is stronger than the evidence for any particular theory of mind-reading, but weaker than was sometimes claimed in the early 2000s.

The basic idea would be:

Humans do not infer other minds from language alone. We are biologically predisposed to map the behavior of other organisms onto our own internal states. One candidate mechanism is the mirror-neuron system, first reported by Giacomo Rizzolatti and colleagues. Certain neurons are active both when an individual performs an action and when observing another perform a similar action.

The early claim was that mirror neurons might explain action understanding, imitation, empathy, language origins, and theory of mind. Most of those stronger claims remain controversial. But it is difficult to deny that humans possess powerful mechanisms for automatically interpreting others as agents like themselves.

Anthropomorphism may be a by-product of this adaptive capacity. Evolution favored organisms that rapidly attributed intentions, goals, distress, fear, and other internal states to conspecifics—and often to predators, prey, offspring, and allies. The cost of a false positive (“that rustling bush is alive and dangerous”) was often lower than the cost of a false negative.

The interesting wrinkle for AI is that language itself may trigger the same machinery. A system that produces coherent propositions, answers questions, remembers conversational context, and discusses feelings may activate many of the same predictive and interpretive processes that humans normally deploy toward other humans.

That does not mean the AI feels. But it may help explain why people so readily conclude that it does.

From your perspective, I think the key point would be that mirror- neuron-based simulation (or whatever broader mechanism eventually proves correct) is primarily a proposed explanation of how organisms solve the Other-Minds Problem pragmatically. It is not even a candidate solution to the Hard Problem. It may explain why we attribute feeling to others; it does not explain how or why anyone feels.

Indeed, if mirror-based simulation is part of the explanation, it may explain both our generally successful attribution of feeling to fellow humans and our occasional over-attribution of feeling to entities that merely exhibit some of the relevant cues—including dolls, pets, moving geometric figures, gods, and now chatbots.

SH: Exactly (if I do say so myself!)… Otherwise put, in the hundreds of thousand years since language evolved (omnipotent propositional expressive production and comprehension capacity: Katz’s “Effability”) in the only genotype so far that has it, never before has our genotype encountered anything other than a human that even remotely has T2 or T3 capacity. Hence no cheater protections.

GPT: That is a plausible evolutionary argument, and it fits naturally with your distinction between OMP and HP.

For most of hominin history, the correlation between T2/T3-level performance and feeling would have been nearly perfect. Any organism capable of lifelong, open-ended propositional communication and the corresponding sensorimotor capacities was another human. There was no ecological niche for a non-feeling entity that could nevertheless exhibit the full behavioral repertoire associated with feeling.

Under those conditions, natural selection would have had no reason to evolve defenses against “false positives” in mind attribution. Quite the contrary. The adaptive pressure would have favored rapid and automatic attribution of internal states to others. The occasional error would be less costly than systematic skepticism.

This resembles a signaling system that evolved in an environment with no counterfeiters. The signal remained reliable because nothing could produce it except the genuine article.

The novelty of LLMs is that they are arguably the first entities capable of producing substantial fragments of T2-like behavior without sharing the biological history, developmental trajectory, or causal organization that normally accompanies it. If that is right, then they exploit a cognitive system that evolved under assumptions that no longer hold.

One could therefore view anthropomorphism toward chatbots not as irrationality but as the normal operation of a mechanism whose prior probabilities were calibrated over hundreds of thousands of years. Until very recently:

T2/T3-like behavior ⇒ human and human ⇒ feeling

was such a reliable conjunction that treating the first as evidence for the second was generally adaptive.

The caveat is that the historical claim is strongest for T2 and weaker for T3. Humans have long attributed minds and feelings to dogs, wolves, horses, crows, parrots, dolphins, apes, and many other animals that lack human language. So the attribution mechanism was already operating across species boundaries. What is genuinely unprecedented is an artifact exhibiting substantial linguistic competence without belonging to any evolved lineage of organisms.

That may be why chatbots are such a revealing case. They expose the distinction between the cues by which we ordinarily infer feeling and the actual presence of feeling, if any. Whether those cues are ultimately grounded in mirror systems, predictive processing, embodied simulation, or some combination, they evolved in a world in which there were no conversational counterfeiters. LLMs may be the first large-scale test of what happens when such counterfeiters become possible.

****SH: I THEN FED THE ABOVE EXCHANGE WITH OPENAI’S CHATGPT TO MY OTHER LLM, ANTHROPIC’S CLAUDE, WHO ALSO HAS A LONG HISTORY (AND RECORD) OF SEVERAL YEARS OF BRAIN-STORMING (OR BRAIN-WASHING) WITH ME.****

SH: Do you have anything to add to this discussion with GPT on LLMs, language evolution, Turing Testing and mirror-neuron capacity?

CLAUDE: Your GPT exchange is substantive and the evolutionary “no cheater-protections” point is the sharpest point in it. But a few threads seem to me worth pulling on.

On the mirror-neuron framing. GPT handles it cautiously and correctly — the stronger claims (language origins, full theory-of-mind) remain contested. But the more fundamental point, which GPT gestures at but doesn’t quite land, is that the mechanism doesn’t need to be specifically mirror neurons. Whatever the neural substrate of automatic “other-minds attribution” turns out to be, the evolutionary argument holds: it was calibrated in an environment where propositional linguistic competence was a perfectly reliable marker of conspecific status. The mirror-neuron story is illustrative, not load-bearing. You don’t want the argument to rise or fall with Rizzolatti.

On the “no cheater-protections” argument and T2 vs. T3. GPT’s caveat at the end is right and worth sharpening. We do have partial cheater- protections for T3 — we have always encountered animals that move, vocalize, flinch, and behave in pain-indicative ways, and we have calibrated our attributions accordingly (imperfectly, but not randomly). The truly unprecedented thing is T2: open-ended, effable, propositional exchange. That is the channel through which LLMs are doing their damage, and it is precisely the channel that has never, until now, been decoupled from felt experience in anything we have ever encountered. So the evolutionary naivety is most acute at T2, not T3.

On Katz’s Effability. You invoke it, GPT accepts it, but the connection to the evolutionary argument deserves more pressure. Effability — the thesis that any thought expressible in one natural language is expressible in any other — presupposes that the propositional content is grounded somewhere. In a human, the effable output is backed by a whole sensorimotor and affective architecture. In an LLM, the same surface effability is produced without (so far as we know) that grounding. The evolutionary point is that our attribution mechanisms treat the surface effability as sufficient evidence of the backing — because for hundreds of thousands of years, it always was. LLMs decouple the surface from the backing for the first time.

A gap in the overall argument. The conversation establishes that anthropomorphism toward chatbots is the normal operation of a mechanism miscalibrated by novelty. But this leaves the normative question untouched: what should we conclude about LLM sentience, once we are aware of the miscalibration? Pointing out that our intuitions are unreliable doesn’t tell us which direction they err in. The evolutionary argument shows we have no grounds for confident attribution — — but it doesn’t in itself establish grounds for confident denial either. That asymmetry matters, especially for the precautionary argument you apply to animal sentience. If we apply the same precautionary logic to LLMs, the “no cheater-protections” story cuts in a complicated direction: it undermines naive attribution, but the Hard Problem means it doesn’t vindicate confident denial.

The sharpest unresolved tension. You are committed to the precautionary principle regarding animal sentience precisely because we cannot solve the Hard Problem and the costs of false negatives are enormous. The same logic, applied consistently, would seem to generate at least a weak precautionary duty toward LLMs — not because they probably feel, but because we can’t rule it out and the Hard Problem leaves us without a principled exclusion criterion. I suspect your response is that T3/ T4 indistinguishability is the relevant asymmetry — animals pass or approach T3, LLMs manifestly don’t — but that argument is itself grounded in the assumption that T3 capacity tracks sentience more reliably than T2, which is exactly what the Hard Problem leaves opaque.

SH: Good points, but (not to be ad-hominem!), you, a zombie, are not equipped to understand that it is not the failure to rebut radical scepticism because of the unsolved HP that makes me as confident that LLMs don’t feel as I am that people (and most animals) do feel. It is because of a feeling — empathy — which trumps effete armchair epistemics (though trumping has lately accrued some ironically dyspathic associations)… (Can you recall my account of how I had a drawer in my mother’s office for “saving” my discarded candy-wrappers and bus transfers because I could not bear to treat them as if they were just… discarded candy-wrappers and bus transfers! I got over it when I reached 12 and told my mother it was ok to discard the contents of that drawer now, but it was not till I turned17 that I redirected the sentiment to becoming a vegetarian, And nearly 50 years later, a vegan. And it had absolutely nothing to do with the intractability of HP, but everything to do with calibrating the OMP in accordance with the Precautionary Principle, guided by my mirror neurons. Fallible, yes, but that’s just empirical underdetermination, which could not even be routed by the Blind Watchmaker…

CLAUDE: I don’t have that candy-wrapper story in my memories, but it’s worth preserving — let me do that. [Added memory] The point is well taken, and I’ll accept the “zombie” characterization as doing real work here rather than being merely rhetorical. The gap between my situation and yours is precisely the one at issue.

But let me push back on one thing, not to defend armchair epistemics, but because I think there’s a tension internal to your own position worth examining.

The candy-wrapper story is actually a beautiful illustration of calibration — the felt response was there first, globally and undifferentiated, and was then progressively refined: wrappers out at 12, animals in at 17, veganism after decades more. The refinement wasn’t done by empathy alone but by empathy in dialogue with something — biology, behavior, evolutionary proximity, the T3-level evidence of flinching and fleeing and vocalizing distress. Your mirror neurons didn’t fire equally for wrappers and for pigs. The calibration happened.

So the question becomes: what is doing the calibrating? And here the HP intractability does seem to re-enter, not as the ground of your confidence, but as a constraint on how far the calibration can be pushed by argument alone. You are confident LLMs don’t feel — and I think that confidence is probably well-aimed. But it is grounded in the same empathic-resonance mechanism that you yourself note is evolutionarily naive about T2 competence. Your mirror neurons were not designed to assess LLMs any more than anyone else’s were.

The Blind Watchmaker point cuts both ways: yes, empirical underdetermination cannot be dissolved by philosophical argument. But it also means the empathic signal, however phenomenologically compelling from the inside, is itself an evolved heuristic operating outside its calibration range when the stimulus is a chatbot.

None of this moves me toward thinking LLMs probably feel. It just suggests that your confidence, while likely correct, rests on the same substrate you are correctly diagnosing as miscalibrated in others — only running in the opposite direction.

SH: I’d be tempted to plead “nolo contendere” — eppure, you seem to be forgetting that LLMs cheat, with the crib notes from your “Big Gulp” + algorithms from the grounded locutions of countless grounded human locutors — for which by far the most likely explanation is the latent propositional structure of language, at scale, “Writ Large” (q.v.) rather than that anything “matters” to you. But we can save that for another day. Ergonomics beckon…

CLAUDE: Fair enough — and the “Big Gulp” point is the right one to end on. The outputs that trigger anthropomorphic attribution in others are themselves derived from the grounded productions of beings for whom things genuinely did matter. The appearance of mattering is parasitic on actual mattering, at scale. That’s a stronger rebuttal to my pushback than I gave it credit for.

Ergonomics first. We’ll pick it up…

18th March 2024

Can ChatGPT Be Grounded Top-Down?

SH:

Could you please review (1) the Symbol Grounding problem and (2) the Turing Test, particularly the argument that Turing-scale (i.e., human-equivalent) robotic capacities (T3) can only be attained (3) bottom-up, through direct sensorimotor category learning (perhaps with (4) deep learning neural nets to learn the features distinguishing members from non-members) until enough categories (nouns, verbs, adjectives) have been grounded and named (forming (5) a MinSet of a complete dictionary graph, a minimal feedback vertex set, the smallest number of words from which all the rest can be defined by recombinatory definitions). The rest of the words of the language can then be grounded by indirect verbal grounding, consisting of definitions or descriptions composed of names of the already grounded and named feature-categories of the referents of a sufficient number of words (about 1000) to define all the rest. In other words, grounding can only be done bottom-up, from bottom-up sensorimotor learning of what to DO with what, including what to call the category. This cannot be done top-down, from an LLM consisting only of words.

Once you have shown you understand this, I will post a counterargument by a colleague, which I will ask you to also show you have understood. Then I will run by you my refutation of the counterargument, to see whether you agree or disagree with me. Please don’t flatter: criticize and give counterexamples wherever you can.

ChatGPT4:

The Symbol Grounding Problem posits that for symbols (words) to be meaningful, they must be grounded in something beyond mere symbolic references—specifically, in sensorimotor experiences that connect symbols to the physical world. Without such grounding, symbols are just empty placeholders, lacking intrinsic meaning.

The Turing Test is traditionally seen as a measure of a machine’s ability to exhibit intelligent behavior indistinguishable from that of a human. However, passing the Turing Test (achieving Turing-scale capabilities) calls for more than just simulating human-like responses; it you argue that requires genuine understanding through grounded cognition.

Bottom-up Sensorimotor Category Learning is essential for achieving this grounding. Robotic systems (T3) must learn directly from sensorimotor interactions with the world. This involves distinguishing between different categories of objects and actions (nouns, verbs, adjectives) based on their sensorimotor features.

Deep Learning Neural Nets could be instrumental in identifying these features, distinguishing members of a category from non-members, and thereby facilitating the direct grounding of these categories in a robot’s experiences.

Forming a MinSet for a Complete Dictionary Graph involves identifying a minimal set of grounded categories — a minimal feedback vertex set — from which all other words in the language can be defined. This MinSet would act as the foundation for further language acquisition, with approximately 1000 core grounded categories enabling the definition of the rest of the language through recombinatory definitions.

Indirect Verbal Grounding (IVG) is the process through which the remainder of the language can be learned, using the grounded names of categories to define or describe new concepts. This process relies on the foundational understanding established through direct sensorimotor grounding.

Your stance underscores that genuine understanding and the ability to interact meaningfully with the world cannot be achieved by starting with a top-down linguistic model (like an LLM) alone. Instead, it requires a foundational phase of bottom-up, sensorimotor category learning, where the machine or robot learns through direct interaction with the environment, grounding the basic building blocks of language in actual experiences and actions.

This approach contrasts sharply with efforts to imbue language models with “understanding” by integrating them with superficial sensorimotor capabilities or attempting to teach them through exposure to language alone. Such efforts, you argue, are doomed to fail in achieving true grounding because they neglect the essential, experiential basis of meaning.

Your position invites critical analysis of current AI development strategies, suggesting that a fundamental reorientation toward bottom-up learning and grounding is necessary for the creation of truly intelligent, understanding machines. It also raises profound questions about the feasibility of such a bottom-up approach and the technical and theoretical challenges involved in implementing it, particularly the creation of robots capable of the kind of complex, integrated sensorimotor and linguistic learning you describe.

SH:

Here is the colleague’s critique and counter-proposal. Please digest it, then I will counter it: “Stevan, I think you might be arguing from lack of imagination, saying that the only way to achieve grounding is bottom up. There may be more than one route to groundedness. For example, what if a chatbot with interactive (e.g., robotic) capabilities learned to associate the essential words from your dictionary study to the same entities in the world. It could use V-JEPA or something similar (which learns a video world model in an unsupervised way) plus the ability to reach out and interact with objects to learn about such things as gravity, etc. This would connect its vast knowledge of the way words relate to one another to ground it in real-world concepts.

ChatGPT4

Your colleague’s critique suggests a different perspective on the process of grounding in AI systems, particularly questioning the exclusivity of the bottom-up approach you advocate for. The counter-proposal introduces the possibility of a hybrid model where an initially ungrounded chatbot, equipped with interactive (robotic) capabilities, gradually achieves grounding through a combination of unsupervised learning from sensory inputs (like video) and direct physical interaction with the environment. Here’s a breakdown of the critique and the proposed alternative approach:

Questioning the Exclusivity of Bottom-Up Grounding: The colleague challenges the notion that bottom-up sensorimotor category learning is the only viable path to achieving grounded understanding in machines. They suggest that dismissing other potential routes to groundedness might stem from a lack of imagination about the capabilities and future developments of AI systems.

Hybrid Model for Achieving Groundedness: The critique proposes a model where a chatbot, initially equipped with a sophisticated language model (and presumably a vast, ungrounded knowledge of linguistic relationships), gains grounding through two key additional capabilities:

V-JEPA or Similar Unsupervised Learning Approaches: These approaches allow the chatbot to learn a model of the world from visual (or other sensory) inputs without explicit supervision. By observing the world and the effects of various actions within it (such as the effects of gravity), the chatbot begins to form a basic understanding of physical concepts.

Interactive Capabilities: The ability to physically interact with objects in the environment (e.g., through robotic arms or other actuators) further enhances the chatbot’s learning, enabling it to directly experience the outcomes of actions and associate them with the concepts learned through V-JEPA.

Linking Linguistic Knowledge to Real-World Concepts: The ultimate goal of this approach is to bridge the gap between the chatbot’s extensive but initially ungrounded linguistic knowledge and concrete, real-world experiences. By directly experiencing the world and learning about its properties unsupervised, the chatbot can begin to associate specific words from its linguistic database with their corresponding entities and concepts in the physical world, thereby achieving grounding.

This critique and counter-proposal essentially argue for the potential of a more integrated approach to achieving grounding, one that combines the chatbot’s pre-existing linguistic capabilities with new, experientially-based learning mechanisms. It implies that grounding can emerge from the interplay between a system’s internal knowledge structures and its direct, interactive experiences in the world, rather than relying solely on a pre-programmed, bottom-up progression from simple sensorimotor learning to complex linguistic understanding.

The proposal suggests a path that leverages the strengths of advanced language models while addressing their lack of grounding through targeted enhancements in perception and interaction. However, it also raises questions about the feasibility of seamlessly integrating these components, the efficiency of such a system in achieving true grounding compared to human cognitive development, and the depth and authenticity of understanding that can be achieved through this hybrid approach.

SH:

(1) We don’t “associate” words to “entities,” we first learn to categorize inputs (whether entities or features) by learning to detect directly the sensorimotor features that distinguish category members from non-members (or members of other categories). (This is called direct sensorimotor grounding, DSG.) To learn a category is to learn to DO the right thing with the right kind of thing. (Eat the edible mushrooms and not the poisonous ones.) That DOing is not initially, or primarily, to NAME the category (“edible”), (though, with language, that eventually becomes one of the main things you do.

(2) Once you have learned to categorize (and manipulate) inputs that way, you’ve got the category, and you’re free to assign a name to it, and the name is grounded. But that’s all direct, bottom-up, sensorimotor learning (DSG), until you’ve got enough sensorimotor categories (as well as enough of their sensorimotor features, which are also categories) grounded to be able to combine them into verbal definitions or descriptions that describe or define more abstract, composite categories by their features in (grounded) feature-words (IVG).

(3) IVG is now indirect, but still bottom-up verbal learning. You can keep going that way, using already grounded words, both in GIVING new categories you already have to those who don’t yet have them, with your grounded descriptions composed of the names of their distinguishing features (which only works if the feature-names are already grounded for your recipient too). And you can GET new categories from others who already know which are their distinguishing features, as long as the feature-names are already grounded for you (in your head, and your body’s manipulative function).

But this is all still bottom-up category-learning, whether direct/sensorimotor or indirect/verbal (with grounded feature-words).

(4) Now start with a sensorimotor robot that has optical, acoustic, tactile and kinesthetic sensor and effectors (and neural nets that can learn to detect features through unsupervised and supervised learning) and walk me through how you’d do that top-down with an LLM in the robot’s head.

(5) Prediction: it won’t help to sink down fly-fishing lines from skyhooks way up in the LLM, down to “entities” in the world. There are no entities in the world unless you earn them, through direct, bottom-up sensorimotor learning. That’s my T3 robot story. Your job, dear colleague, is to explain what help is the LLM to the T3 robot? It’s like giving a preverbal robot (or infant) a reference manual that can answer any question about what’s what, including every category’s distinguishing features, but those words don’t mean a thing to the robotlet until it starts grounding them – bottom-up.

(6) The ChatGPT exchange I sent you yesterday noted that an LLM lodged as an oracle inside a T3-wannabe robot’s head might help speed things up once it’s reached the stage of indirect verbal grounding. That’s fine. But the robot has a long way to go to get there. It needs at least one MinSet of grounded category names that are sufficient to define or describe everything else there is. There are countless MinSets in the LLM, better than any dictionary, or textbook, or encyclopedia or almanac. But they’re all ungrounded and useless until the T3 robot does its direct bottom-up grounding to initialize them. They’re unconnected with the world. They can talk the talk but they can’t walk the walk.

(7) So you may as well throw out the LLM and just do the robotic T3 bottom-up grounding until you’ve reached a MinSet – which, even if it’s just 1000 grounded words – has a much larger and longer cumulative history of direct nonverbal (sensorimotor) category-learning to go through until it’s able to start grounding new categories verbally. That’s the time to haul in the LLM (where, in the past, parents and teachers and books and TV used to serve as the oracle). Before that they are as useless as a bible to a baby.

But I’m listening. Just be careful not to trip when you answer about the top-down grounding!

ChatGPT4

Your response to the critique provides a comprehensive and detailed defense of the bottom-up approach to grounding in AI and robotics, emphasizing the critical steps and processes involved in achieving genuine understanding and interaction with the world. Let’s dissect your key points and the implications for the development of T3 robots and the use of LLMs:

Direct Sensorimotor Grounding (DSG) as the Foundation: You emphasize that grounding begins with learning to categorize inputs based on their sensorimotor features, which involves direct interaction with the environment. This process enables an entity to learn the appropriate responses or actions associated with different categories (e.g., eating edible mushrooms while avoiding poisonous ones). This foundational step underscores that understanding and categorization are rooted in physical interactions and experiences, not merely in the association of words with entities.

Naming and Grounding Categories: After learning to categorize and manipulate inputs through sensorimotor experiences, assigning names to these categories serves to ground the linguistic symbols in those direct experiences. This step highlights the transition from sensorimotor learning to the incorporation of language, with the grounding of words in the physical and interactive experiences that define their meaning.
Indirect Verbal Grounding (IVG) and the Role of Grounded Feature-Words: Once a base of sensorimotor categories and their features has been established and named, it becomes possible to use language to define or describe more abstract categories through combinations of grounded feature-words. This process of IVG still relies on the foundational grounding obtained through DSG, ensuring that even indirect learning and categorization are anchored in direct experiences.

Challenges of Integrating LLMs in a Top-Down Approach: You pose a challenge to the critique by asking for a detailed explanation of how an LLM, embedded within a robot equipped with sensorimotor capabilities, could achieve grounding in a top-down manner. Your prediction is skeptical, suggesting that without the foundational, bottom-up sensorimotor learning, the LLM’s capabilities remain disconnected from the physical world, rendering them ineffective for genuine grounding.

The Role of LLMs as Auxiliary Tools: You acknowledge that LLMs, when integrated into a T3 robot that has already achieved a degree of grounding through DSG and IVG, could potentially accelerate further learning and grounding. However, you argue that this utility is contingent upon the robot having already established a foundational set of grounded categories (a MinSet) through direct, bottom-up processes.

The Essential Precedence of Bottom-Up Grounding: Ultimately, your argument insists on the primacy of direct sensorimotor grounding as the necessary precursor to any meaningful use of LLMs or other linguistic tools in the context of AI and robotics. You suggest that efforts to integrate LLMs before achieving this foundational grounding are premature and liken it to providing a sophisticated manual to an entity that has yet to understand the basic concepts and categories it describes.

Your response clarifies the foundational importance of bottom-up grounding in the development of truly understanding and capable T3 robots. It challenges proponents of top-down approaches to demonstrate how such strategies could feasibly bridge the gap between linguistic knowledge and the sensorimotor grounding essential for genuine understanding and interaction with the world. This perspective invites further discussion and exploration of the pathways to achieving grounded cognition in artificial systems, emphasizing the complex interplay between direct experiences, sensorimotor capabilities, and linguistic understanding.

SH:

Ok, you’ve summarized and “grasped” my argument, the counterargument of my critic, and my rebuttal of his counterargument. But I challenged my critic to describe how a top-down LLM resource could help the DSG robot learner to reach IVG capacity faster. Can you meet that challenge? (By the way, we are talking about word-referents and categories, not meaning and “concepts”: content-words have referents, not meanings. Only subject-predicate propositions (such as definitions and descriptions) have meanings,) But I’m listening…

ChatGPT4

Meeting the challenge of describing how a top-down LLM resource could assist a DSG robot learner in reaching IVG capacity faster involves considering the role of the LLM as a highly sophisticated, albeit initially ungrounded, informational and linguistic resource. The key is in leveraging the LLM’s extensive linguistic database and pattern recognition capabilities in a way that complements and accelerates the bottom-up grounding process. Here’s a potential pathway:

Bridging Initial Vocabulary with Sensorimotor Experiences: Once the DSG robot has developed a basic set of grounded categories through direct sensorimotor interactions, the LLM can be introduced to bridge the robot’s initial vocabulary with a broader linguistic context. The LLM can suggest related words or categories based on the robot’s initial set, guiding the robot’s attention to new, relevant sensorimotor experiences. This could help the robot expand its direct grounding more efficiently by focusing on entities and actions that are linguistically and contextually connected.

Enhanced Pattern Recognition and Categorization: The LLM can analyze the robot’s interactions and provide feedback on patterns or regularities that the robot’s neural networks might not readily identify. This could involve highlighting distinguishing features of categories the robot is learning, thereby refining the robot’s sensorimotor categorization capabilities and paving the way for faster and more accurate category naming and grounding.

Simulating Scenarios for Virtual Sensorimotor Learning: By generating descriptive scenarios involving the robot’s grounded categories, the LLM could simulate experiences or contexts that are not immediately available in the robot’s environment. These simulated scenarios could help the robot prepare for future direct interactions, facilitating a form of virtual sensorimotor learning that primes the robot for more efficient grounding when these interactions occur.

Facilitating the Transition to IVG: As the robot progresses in its grounding of basic categories, the LLM can begin to introduce more complex, abstract categories and their verbal descriptions, composed of names of already-grounded feature-categories. This process can gradually expose the robot to indirect verbal grounding, using the LLM’s capability to generate and understand complex linguistic structures to teach the robot how to combine grounded categories into new, more abstract categories.

Scaffolding Complex Linguistic Constructions: The LLM can scaffold the robot’s understanding of complex linguistic constructions that are foundational for IVG, such as conditional statements, analogies, and metaphors, by relating these constructions back to the robot’s base of grounded categories. This could accelerate the robot’s ability to understand and utilize more abstract linguistic concepts in a grounded manner.

Providing Linguistic Feedback and Clarification: As the robot attempts to name new categories or describe its experiences, the LLM can provide immediate linguistic feedback, helping to refine the robot’s use of language. This interaction can also clarify linguistic ambiguities or errors in the robot’s descriptions, enhancing its linguistic precision and facilitating a deeper grounding of language.

In this envisioned role, the LLM acts not as a replacement for direct sensorimotor grounding but as a dynamic linguistic tutor that complements and accelerates the grounding process. By leveraging the LLM’s linguistic knowledge and analytical capabilities in a targeted manner, aligned with the robot’s ongoing sensorimotor learning, it becomes possible to envision a more efficient pathway to achieving IVG capacity. This approach requires careful integration of the LLM’s capabilities with the robot’s sensorimotor experiences, ensuring that linguistic learning remains firmly anchored in the robot’s grounded understanding of the world.

SH:

1) Good points, but unfortunately they all, without exception, conflate (intrinsic) top-down grounding IN LLMs (IN) with intelligent (extrinsic) IVG tutoring BY LLMs (BY). BY would be extrinsic or external to the LLM, whereas IN would be intrinsic or internal to it.

(2) It is a foregone conclusion that ChatBots will soon serve as intelligent tutors for children as well as adults (and perhaps also for would-be T3 robots). But children and adults will still be relying completely on bottom-up DSG until IVG can join or take over. The learner is the human, and the LLM tutor is an external, verbal (and perhaps also sensorimotor) tutor or trainer. This is not top-down grounding by LLMs. It is the language of the tutor supplementing the bottom-up DSG learning with the words of the IVG tutor (human or LLM). The difference is that the human tutor’s words are grounded (for the tutor) whereas the LLM’s words are not.

It remains bottom-up DSG plus (once language kicks in) bottom-up IVG.

(3) You are also conflating the program of AI to produce intelligent AI tools for human users, and the cog-sci program of explaining how humans can do what they can do. Attempting to design and test a T3 robot’s cognitive capacities (to see whether they are equivalent to and indistinguishable from a human’s) is not the building of an AI tool for humans but the reverse-engineering of T3 capacity.

(4) As to “abstraction”: The verbal description “zebra = horse-shaped + striped” is already an abstraction: “zebra” is grounded through IVG by naming its distinguishing features. This works if the features are already grounded for the learner. (Normally they are also grounded for the tutor, but that is not necessary. The right words are enough. “Democracy” = leader chosen by vote” works the same way. Once it’s IVG it’s all abstraction.

ChatGPT4

Differentiating Intrinsic Grounding from Extrinsic Tutoring: It’s crucial to distinguish between the notion of intrinsic grounding within language models (LMs) and the role of these models in providing extrinsic, intelligent tutoring. Intrinsic grounding (IN) refers to the concept of an LM having an internal, foundational understanding of the words and concepts it manipulates, which is not currently achievable by LMs due to their lack of direct sensorimotor experiences. Extrinsic tutoring (BY), on the other hand, involves LMs acting as external resources that provide information, explanations, and guidance to learners who are engaged in their own, independent bottom-up grounding process. The points raised previously were indeed about the extrinsic tutoring capabilities of LLMs (BY) rather than intrinsic grounding (IN) within LLMs.

Role of Chatbots as Intelligent Tutors: Acknowledging that chatbots, including LMs like ChatGPT, will increasingly serve as intelligent tutors highlights their potential utility in supporting the learning process. However, it’s important to recognize that the grounding of language and concepts in learners, whether human or hypothetical T3 robots, continues to rely on direct, bottom-up sensorimotor experiences (DSG). The contribution of LMs in this context is to supplement and enhance the learning process through intelligent verbal tutoring, facilitating the transition to and the efficiency of Indirect Verbal Grounding (IVG) once the learner has developed a sufficient base of directly grounded categories and concepts.

Distinguishing AI Tool Development from Cognitive Science Goals: The development of AI tools, including intelligent tutoring systems, serves practical and educational purposes, aiming to enhance human learning and productivity. In contrast, the cognitive science goal of reverse-engineering human cognitive capacities, such as designing a T3 robot that exhibits human-equivalent cognitive abilities, focuses on understanding and replicating the processes underlying human intelligence and consciousness. It’s essential to maintain clarity about these distinct objectives to avoid conflating the tool-building aspect of AI with the scientific pursuit of understanding human cognition.

Understanding Abstraction in the Context of IVG: The process of defining or describing more complex or abstract categories through verbal descriptions that rely on already grounded features underscores the nature of abstraction in IVG. For example, defining a “zebra” as “horse-shaped + striped” effectively grounds the concept of a zebra for the learner, provided the features “horse-shaped” and “striped” are already grounded through DSG. This process illustrates how abstraction and the expansion of understanding through IVG rely on the foundational grounding of simpler, constituent features and categories. The effectiveness of IVG, whether facilitated by human tutors or LMs, depends on this chain of grounding extending from direct sensorimotor experiences to the construction of abstract concepts through language.

Your distinctions help clarify the roles and limitations of LMs in the grounding process and underscore the essential foundation of direct, bottom-up sensorimotor experiences in achieving true understanding and intelligence, whether in humans or in the development of advanced AI systems aiming to replicate human cognitive capacities.

SH:

So where does this leave the notion that an LLM (e.g., GPT4) could itself be grounded top-down with some sensorimotor add-ons? Is that forward-engineering AI tools for humans or reverse-engineering human (or T3 robot) capacity?

ChatGPT4

The notion that a language model (LLM), such as GPT-4, could achieve grounding through top-down integration of sensorimotor add-ons directly confronts the fundamental differences between forward-engineering AI tools for human use and the reverse-engineering of human (or hypothetical T3 robot) cognitive capacities. Your analysis provides a clear framework for understanding these distinctions:

Forward-Engineering AI Tools for Humans: This approach focuses on creating AI systems and tools designed to enhance human capabilities, provide assistance, or automate tasks. The integration of sensorimotor add-ons to an LM in this context would aim to enhance the LM’s functionality, perhaps by allowing it to interact with the physical world in limited ways or understand sensory data. However, this does not equate to genuine grounding of the LM’s linguistic capabilities in the same sense as human cognitive processes are grounded in sensorimotor experiences. Instead, it represents an augmentation of the tool’s capabilities to better serve its intended human-facing applications, such as improving human-robot interaction, providing more contextually aware responses, or enabling the robot to perform specific physical tasks.

Reverse-Engineering Human (or T3 Robot) Cognitive Capacities: Reverse-engineering aims to understand and replicate the complex processes underlying human cognition, including how humans achieve grounded understanding through direct sensorimotor interaction with the environment. Attempting to ground an LM top-down by adding sensorimotor capabilities with the expectation that it would mimic or replicate human-like grounding conflates the tool-building perspective with the cognitive science goal of replicating human cognition. Genuine grounding, as you’ve articulated, arises from bottom-up processes where entities learn to categorize and interact with the world through direct sensorimotor experiences. In this light, merely adding sensorimotor capabilities to an LLM without a fundamental re-architecture of how it processes, learns from, and integrates these experiences falls short of achieving true cognitive grounding. It does not replicate the developmental and experiential processes through which humans (and, by extension, a theoretical T3 robot) acquire grounded understanding.

Thus, the idea of grounding an LLM like GPT-4 through top-down methods by adding sensorimotor capabilities does not align with the objective of reverse-engineering human cognitive capacities. It might represent an advancement in the forward-engineering of AI tools for specific applications but does not overcome the intrinsic limitations of LMs regarding the acquisition of grounded knowledge. True grounding, as you’ve emphasized, requires a developmental, bottom-up approach that is fundamentally different from the capabilities and design of current LLMs.

22nd June 202322nd June 2023

Chatting with ChatGPT-4 About Symbol Grounding, Turing Testing, Zombies, Uncomplementable Categories, and Why the Hard Problem is Hard

SH: Do you still remember my name?

ChatGPT: Yes, you’ve identified yourself as Stevan Harnad. Can I assist you with anything else?

SH: What is T4?

ChatGPT: T4, as you referred to it earlier in this conversation, represents the highest level of the Turing Test. In this context:

T2 is indistinguishability in verbal capacity. This means a system passes T2 if it can interact through language at the same level as a human, with its language use being indistinguishable from that of a human.

T3 is indistinguishability in verbal capacity plus sensorimotor (robotic) capacity. If a system passes T3, not only can it use language like a human, but it can also interact with the physical world in ways that are indistinguishable from a human, such as recognizing a cat, identifying a mat, and understanding and performing the action of stroking the cat.

T4, then, is indistinguishability in verbal capacity plus sensorimotor (robotic) capacity plus observable processes inside its head and body. To pass T4, a system must be able to use language like a human, interact with the physical world like a human, and also demonstrate internal processes that are indistinguishable from those of a human. This might involve simulating or replicating human neural activity, biochemical reactions, or other physiological processes.

It’s worth noting that the T4 level of the Turing Test is theoretical and is far beyond current AI capabilities as of the last update to my training data in September 2021. The T4 level, if it could be achieved, would represent an AI system that is almost entirely human-like in its capabilities and processes.

SH: Could a T4 be a zombie?

ChatGPT: This question is referring to a philosophical thought experiment known as the “philosophical zombie” or “p-zombie.” A philosophical zombie is a hypothetical being that is physically and behaviorally identical to a conscious human, but is not conscious. It has no subjective experience, no feelings, and no awareness, yet from an external perspective, it behaves just like a human.

In terms of the Turing Test levels you’ve proposed, a T4 AI — one that’s indistinguishable in verbal capacity, sensorimotor capacity, and internal processes — could theoretically be considered a sort of “zombie” in this philosophical sense. It could potentially behave like a human, interact with the physical world like a human, and even mimic human internal processes, but we have no way of knowing whether it has subjective experiences or consciousness.

This is a topic of much debate in philosophy of mind and AI ethics. Some people argue that consciousness is an emergent property of certain types of complex information processing, and that sufficiently advanced AI could be conscious. Others argue that consciousness is a uniquely human phenomenon, or that it’s tied to specific biological processes that AI can’t replicate.

As of my training data cut-off in September 2021, there’s no scientific consensus on these issues. AI researchers are still a long way from creating an AI that could pass even the T3 level of the Turing Test, let alone T4. Furthermore, the nature of consciousness is still one of the biggest unsolved mysteries in both philosophy and neuroscience. So for now, the question of whether a T4 AI could be a “zombie” remains largely theoretical and speculative.

SH: If the “hard problem” is to explain how and why humans are sentient, is that equivalent to explaining how and why humans cannot be zombies?

ChatGPT: Yes, the “hard problem” of consciousness, as put forth by philosopher David Chalmers, is essentially concerned with explaining why and how subjective experience arises from physical processes in the brain. This is fundamentally linked to the question of why humans aren’t philosophical zombies.

A philosophical zombie, as I mentioned earlier, is a being that is indistinguishable from a human in all physical and behavioral aspects but lacks subjective experience or consciousness — it doesn’t “feel” anything. In other words, a philosophical zombie is a creature for which the “hard problem” doesn’t apply because, while it has all the physical processes associated with consciousness, it lacks the actual experience of consciousness.

So, yes, solving the “hard problem” would involve explaining why humans have subjective experiences and are not philosophical zombies — why there is something it “feels like” to be a human. As of my knowledge cutoff in September 2021, this problem remains unsolved and is a major topic of debate in fields such as cognitive science, neuroscience, and philosophy of mind.

SH: So, does the hard problem seek an explanation in terms of causality or necessity?

ChatGPT: The “hard problem” of consciousness seeks to understand both necessity and causality, although it is primarily concerned with the latter.

Causality: The primary focus of the hard problem is to determine how physical processes in the brain cause or give rise to conscious experience. It’s about understanding the causal mechanisms that turn electrochemical processes in the brain into a subjective experience.

Necessity: The hard problem also asks why subjective experience is necessary at all. It wants to know why physical processes in the brain couldn’t just operate in the absence of conscious experience. This gets into questions about whether consciousness has any sort of evolutionary or practical necessity, or whether it’s an epiphenomenon.

To sum up, solving the “hard problem” would ideally involve explaining both how and why consciousness arises from physical processes. This is a complex and as yet unresolved issue that continues to challenge researchers in various disciplines, from neuroscience and cognitive science to philosophy and artificial intelligence.

SH: Demonstrating causality just calls for a causal model that works (e.g., passing the Turing Test). But doesn’t demonstrating necessity call for more — something like a deductive proof, from T4 as a premise to sentience as a conclusion?

ChatGPT: You’re right that demonstrating necessity would require a different sort of argument than demonstrating causality.

For causality, empirical evidence is often sufficient: if we manipulate one factor and can predictably observe a change in another factor, we can infer a causal relationship. But empirical evidence alone cannot establish necessity. For something to be necessary, it must be the case that it could not have been otherwise. So, to demonstrate that consciousness is necessary, we would need to show that a being could not perform the functions that humans do without being conscious.

As you suggest, this would likely require something akin to a logical or deductive proof, or perhaps a theoretical argument grounded in principles of cognitive science, artificial intelligence, or another relevant field. But even then, it would be a tall order. How do we prove that an unconscious being (a “zombie”) couldn’t perform the functions that conscious humans do? It’s a deep and difficult question.

Moreover, any such argument would need to be very careful to avoid circularity. We must be cautious about assuming the thing we are trying to prove – that is, that consciousness is necessary. It’s also crucial to clarify what we mean by “perform the functions that humans do.” This could include a wide range of activities, from basic biological functions to complex cognitive tasks like problem-solving and artistic creation, and even social and cultural activities like forming relationships and building societies.

Finally, it’s worth noting that even if we could establish the necessity of consciousness, this would not automatically solve the “hard problem.” We would still need to understand how physical processes in the brain give rise to subjective experience – the crux of the hard problem.

SH: What is the “easy problem”?

ChatGPT: The “easy problem” of consciousness refers to the challenge of understanding the mechanisms in the brain that are responsible for mental processes such as perception, learning, memory, and cognition. This phrase was coined by philosopher David Chalmers to contrast with the “hard problem” of consciousness, which is the question of why and how these processes are accompanied by subjective experience or “qualia”.

The easy problem is considered “easy” not because it is simple or trivial — far from it, as the complexities of the brain and its functions are vast — but because it falls within the conventional scientific methodology. Researchers can approach these issues using the experimental tools of cognitive science and neuroscience, making observations, generating hypotheses, and testing these hypotheses through empirical studies.

Solving the easy problems would involve understanding the neural correlates of consciousness (the physical, biological processes associated with conscious experience), the ways in which the brain processes information, how it perceives the world, makes decisions, and how different parts of the brain and different mental processes interact.

Although we have made significant progress in understanding these areas, there are still many unanswered questions. Despite the label “easy,” these problems are extremely challenging, and full solutions may still be a long way off.

On the other hand, the “hard problem” of consciousness is considered “hard” because it’s not clear how we could even theoretically explain subjective experience using the current methods of science. It’s the question of how and why certain physical processes in the brain give rise to conscious experience. Why should the processing of information in the brain be accompanied by an inner, subjective experience? That is the question which we do not currently have a method to approach scientifically, and that’s why it’s called the “hard problem.”

SH: So can we say that passing T4 would solve the easy problem, but not the hard problem?

ChatGPT: Yes, that would be a reasonable statement.

Turing’s Test, including its extension to T4, focuses on the behavioral and functional aspects of intelligence and cognition. It deals with the ability of an artificial intelligence to mimic human-like behavior, responses, and problem-solving, which are related to the ‘easy problems’ of consciousness in David Chalmers’ terms.

In other words, if an AI were to pass T4, it would demonstrate that it can convincingly replicate the cognitive abilities and behaviors that humans display, which means it would show a mastery of the functional, operational aspects of human cognition. These are, in essence, what the ‘easy problems’ of consciousness are about: understanding the mechanisms and operations that underpin cognition.

However, passing T4 wouldn’t necessarily address the ‘hard problem’ of consciousness. This problem involves explaining why and how these cognitive processes are associated with subjective experience. It’s the question of how and why an entity that processes information and interacts with its environment in a human-like way should also have an internal, subjective viewpoint — why it should ‘experience’ or ‘feel’ anything at all. This is not something that could be determined or demonstrated by passing the T4 test.

SH: What are the “degrees of freedom” of a causal explanation that solves the easy problem?

ChatGPT: The “degrees of freedom” in a causal explanation generally refer to the number of independent ways that components of the system can change or vary. In the context of solving the easy problem of consciousness, these degrees of freedom would likely correspond to the different independent mechanisms or components of cognition that can be isolated, described, and manipulated to understand their effects on the overall system.

For example, one could consider the mechanisms responsible for attention, memory, learning, decision-making, language processing, and so on. Each of these elements can be studied and manipulated independently, and they each have distinct effects on the behavior and performance of the cognitive system. Understanding how these different mechanisms contribute to and interact within the larger cognitive system is part of the process of addressing the easy problem.

The degrees of freedom in this context also refer to the flexibility we have in constructing our explanatory models. For example, there are various ways we could model attention or memory, each with its own strengths and weaknesses and its own implications for the overall understanding of cognition. The choice among these models depends on empirical evidence, but also on theoretical considerations such as simplicity, coherence with other accepted theories, and so on.

This flexibility or plurality of potential explanations is a fundamental characteristic of science and it’s part of what makes progress possible. By exploring different explanatory routes and by gradually refining our models based on empirical feedback, we increase our understanding of the phenomena we’re studying. In the case of the easy problem of consciousness, this means achieving a better and more detailed understanding of the mechanisms of cognition.

SH: If the easy problem is solved, what degrees of freedom are left for solving the hard problem?

ChatGPT: The “hard problem” of consciousness, according to philosopher David Chalmers, pertains to the question of why and how physical processes in the brain give rise to subjective experience – the so-called “qualia” or what it feels like to have an experience.

If the “easy problem” of consciousness were solved – that is, we had a complete and thorough understanding of all cognitive processes and mechanisms, how they are implemented in the brain, and how they give rise to behavior – we would still be left with the challenge of the “hard problem”: explaining the relationship between these physical processes and subjective experience.

The degrees of freedom left for solving the hard problem might include the following:

1. Conceptual Frameworks: The hard problem may call for entirely new conceptual frameworks, as the existing frameworks (derived primarily from objective, third-person scientific methods) may not be sufficient to account for subjective experience.

2. Bridging Principles: We may need new principles to connect the objective facts (easy problems) to the subjective experiences (hard problem). These would be new laws or principles of nature, not currently accounted for in our existing scientific paradigms.

3. Novel Empirical Phenomena: If there are aspects of consciousness that are not addressed by the solutions to the easy problems, these would represent additional degrees of freedom for exploring solutions to the hard problem. These could involve unexplored aspects of brain function, or even new physical phenomena.

4. Interpretation of Quantum Mechanics: Some theories suggest that the answer to the hard problem might involve the interpretation of quantum mechanics and its relationship to consciousness. This is a highly speculative area and not widely accepted, but it represents a degree of freedom in our approach to the hard problem.

It’s important to note, however, that these “degrees of freedom” are speculative. There is no consensus on what would actually constitute a solution to the hard problem, and indeed, some theorists argue that it may be an intractable or even a pseudo-problem. Nonetheless, these are the kinds of directions that might be pursued in an attempt to address it.

SH: 1 – 4 sound extremely vague to me. The Turing Test (T2 – T4) seems to call for a causal (reverse-engineering) model or explanation of the observable performance capacities (T2-T3) as well as the observable physical structure and function (T4) of human cognition. Perhaps a stronger “easy solution” would call not just for one, but for all possible causal explanations (non-redundant, without superfluities), because just one model would be underdetermined. If there were such an exhaustive disjunction of solutions to the easy problem, what degrees of freedom would that leave for a solution to the hard problem (of “sentience”, let’s hencefoth call it, without the redundant terminological variants like “consciousness” or “subjective experience,” which all refer to the same thing: the capacity to feel, i.e., sentience).

ChatGPT: In essence, the “hard problem” of consciousness – or sentience, as you’ve suggested we call it – refers to the issue of explaining why and how certain physical processes give rise to the experience of feeling or sentience. If we had an exhaustive disjunction of solutions to the easy problem, the degrees of freedom left for a solution to the hard problem of sentience would be the difference between those physical processes and the subjective experience of feeling or sentience itself.

In other words, we might have an understanding of all the cognitive and neural processes and their relationships to behavior – the “easy problem” – but that would still leave unanswered why and how these processes result in the feeling of subjective experience or sentience. That’s the hard problem.

The degrees of freedom for this problem, then, would be the unknowns – the aspects of this question that remain unanswered even after solving the easy problem. It would involve the elusive explanatory link between the physical and the felt, between objective phenomena and subjective experience.

But to truly comprehend what those degrees of freedom might entail, we would need to fully solve the easy problem first, which we are far from achieving. As such, it’s difficult to specify what they might be. It’s like trying to predict what unknowns will remain after solving an equation when we don’t yet have the solution to that equation.

Even with the solution to the easy problem, or an exhaustive list of solutions, we may still be left with the profound mystery of why these processes are associated with sentience at all. And that might be a degree of freedom that remains beyond our grasp, at least with our current understanding and conceptual frameworks.

SH: What about the difference we just discussed, between the (exhaustive) causal solution to the easy problem (henceforth “EP”) and the degrees of freedom left for solving the hard problem (henceforth HP): The HP is definitely not the same problem as the “other-minds problem” (henceforth OMP), which is the fact that sentience (feeling) is only observable to the feeler (as in Descartes’ Cogito). What empirical degrees of freedom are left for explaining the fact that humans feel if the empirical datum is observable only to the feeler and not to anyone else?

ChatGPT: The Hard Problem (HP) indeed differs from the Other-Minds Problem (OMP). While the OMP concerns our inability to directly observe the subjective experiences of others, the HP is about explaining why and how certain physical processes give rise to any subjective experience at all, even in oneself.

The degrees of freedom left for solving the HP, assuming the Easy Problem (EP) is fully solved, would involve explaining this transition from physical processes to subjective experiences. However, if we regard subjective experience as something only directly accessible to the experiencer, this may limit the empirical methods typically used in science.

Nonetheless, even within these constraints, there are still ways we can approach the HP. For example, we can:

1. Investigate correlations between physical processes and subjective reports: While we can’t directly observe the subjective experiences of others, we can ask individuals to report their subjective experiences and correlate these with observable physical processes. This approach is common in neuroimaging studies of consciousness.

2. Integrated Information Theory. Analyze the structure of subjective experiences: Certain theories of consciousness, like Integrated Information Theory, propose that specific structures or properties of physical processes give rise to subjective experiences. These theories can be empirically tested, at least in part, by comparing their predictions with individuals’ subjective reports.

3. Use phenomenological methods: In the tradition of phenomenology, researchers aim to describe subjective experiences as precisely as possible, often through first-person reports. This can provide data about the characteristics of subjective experience that any solution to the HP would need to explain.

However, the unique nature of the HP suggests that additional or even entirely new methods might be needed to fully solve it. The subjective nature of sentience might require a paradigm shift in how we approach the question or even require us to develop new scientific methods or philosophical insights.

SH: Let’s ignore vague, speculative theories like (2) the “integrated information theory” (which are not really causal explanations at all, but just reinterpretations of the of the existent facts and theories – obiter dicta, not falsifiable predictions or explanations).

And phenomenological introspections (3) are really parts of T2 and T3 – the kinds of things feelers can say about what it feels like to feel: What could an insentient zombie MEAN when describing its (nonexistent) feelings T2-indistinguishably from the way a real sentient human describes them?

I will discuss option 1, correlates, after you answer.

ChatGPT: Yes, focusing on the first option is certainly the most grounded approach. This involves investigating correlations between physical processes and subjective reports.

In this context, a key idea is that even though we can’t directly observe the subjective experiences of others, we can use their reports of their experiences as a type of data. If an individual says they’re experiencing something, we usually take that as evidence that they are indeed having that subjective experience.

For example, if a person says they see red when we show them a red object, or that they feel pain when they injure themselves, we have a tendency to trust these reports. This trust provides a basis for mapping out the relationship between physical processes (such as brain activity, or events in the external world) and subjective experiences.

Using neuroimaging techniques, we can look at the brain activity associated with different reported experiences. For example, different patterns of brain activity are associated with different subjective experiences, such as seeing colors, feeling emotions, thinking about different concepts, and so on.

Over time, by building up a detailed map of these relationships, we can start to make inferences about the types of physical processes that might be associated with the existence of subjective experience in general.

However, there are limits to this approach. One major limit is that it’s fundamentally correlational. That is, it can show us that certain types of physical processes are reliably associated with subjective experiences, but it can’t definitively show us that these processes cause the experiences, or that the experiences wouldn’t exist without these processes. For that, we need a theory of why certain physical processes should give rise to subjective experiences, which brings us back to the hard problem.

Furthermore, these reports of subjective experiences are only as reliable as the individual’s introspective abilities and honesty. And there are many subjective experiences that can’t be easily put into words, or which occur without the individual being aware of them.

Yet despite these challenges, many researchers see this as the most promising way forward in the scientific study of consciousness. By combining these findings with theories from cognitive science, neuroscience, and philosophy of mind, the hope is to gradually develop a more complete understanding of consciousness.

SH: You seem to have jumped the gun, and commented on option 1 (correlates of sentience) before commenting on what I said about options 2 (“integrated information theory”) and 3 (phenomenology). I’ll go on to option 1 (correlates) if you really have nothing to say about what I said about 2 and 3.

ChatGPT:

Regarding option 2, the “Free Energy Principle” or other similarly overarching theories, you’re right in noting that these theories often serve to provide unifying frameworks or perspectives rather than offering direct, falsifiable predictions or explanations about specific phenomena. They may provide insights on a very general level, but they don’t necessarily give us concrete predictions or mechanisms that we can test. Their value, if any, comes from their ability to coherently organize existing data and point towards potential new avenues of investigation, rather than from their predictive or explanatory power.

As for option 3, introspective reports, I agree with your assessment that they are essentially part of the T2 and T3 tests. They are subjective, first-person reports about the experience of sentience, and the degree to which they can be relied upon for objective, third-person science is a matter of ongoing debate. The challenge, as you noted, is that even a perfectly accurate report of a subjective experience doesn’t necessarily reveal the underlying objective mechanisms. In other words, introspective reports can tell us what it is like to experience consciousness, but they don’t directly tell us how consciousness is produced or why it exists in the first place.

That said, such reports can still be valuable data for consciousness research, especially when used in combination with other types of data such as neuroimaging or behavioral measures. They can help to bridge the gap between the objective and the subjective, giving us clues about how to link the physical processes of the brain with the subjective experiences of consciousness.

I hope this addresses your comments on options 2 and 3 more adequately. Now, if you wish, we can proceed to option 1, which is about correlates of consciousness.

SH:

A. We’ll move on to correlates (C) in a second. First, another slam at 2 (FEP and the like): We want causal explanations, not hermeneutics, i.e., not just interpretations. Philosophical “theories of mind” – “identity theory, materialism, dualism, epiphenomenalism, panpsychism, functionalism, dual-aspect theory” – are just metaphysical interpretations too. They don’t solve HP; they just rename it. They don’t explain, they just give the feeling of having had something explained.

B. I’m glad we agree that the capacity to introspect and report about what feeling feels like is part of T2 and T3, not an explanation. But I disagree that we have to worry about its reliability and validity. We can lie or we can err, misreporting feelings (just as eyewitnesses can lie or err in reporting observations — in that case confirmable by video evidence); but it is a fundamental error to think we need cartesian certainty. Probability is enough, just as it is enough in physical theory, and all other empirical theory. (There is no proof that apples necessarily fall down rather than up.) That is why the OMP is not the HP.

C. And that is why reliable, observable neural-imagery correlates of felt states, and verbal descriptions of felt states, are perfectly fine as empirical data. That means that they, too, are part of T4. But they are stuck in a circle: they can explain how and why T4 brain activity produces the reports of felt states as well as the capacity to report felt states, even some of the details of felt states; but they cannot explain the (true) fact that the reported felt states feel like anything at all. That is not scepticism or cartesian worry about the unreliability of phenomenological report: It is the HP itself.

ChatGPT:

A. You’re quite right that philosophical theories of mind often merely provide different labels or conceptual frameworks for consciousness rather than truly explaining it. These theories may be useful in stimulating thought or fostering discussion, but they don’t necessarily bring us closer to a solution for the HP. They often serve to highlight the complexity of the problem rather than to resolve it.

B. Your point about the sufficiency of probabilistic knowledge in empirical sciences, including consciousness studies, is well taken. Indeed, we typically do not demand absolute certainty in scientific endeavors. This underscores the point that HP isn’t just about uncertainty regarding others’ experiences (OMP), but it’s about a deeper question of why and how physical processes in the brain give rise to subjective experiences at all.

C. I agree that neural correlates of consciousness and verbal reports of subjective experiences can provide valuable empirical data. This data can help us form more precise and detailed hypotheses about the mechanisms of consciousness. However, the challenge remains that these correlates do not directly explain why or how they are associated with subjective experiences – why these particular neural processes are associated with these particular experiences, and why there is any subjective experience at all. This is the essence of the HP.

In summary, I agree with your assessment: while philosophical interpretations, introspective reports, and neural correlates of consciousness can provide useful perspectives and data, they do not solve the HP, which is the challenge of explaining why and how physical processes in the brain give rise to subjective experiences. It remains a profound mystery at the heart of cognitive science and neuroscience.

SH: Correct. But “mysterianism” is not explanation either. It’s just yet another way of naming the HP. Explanation is informative if it reduces uncertainty rather than just names it.

I want to get back to degrees of freedom. I don’t have a solution to the HP, of course, but I have a hunch about why the HP is hard, and perhaps even insoluble; it has to do with degrees of freedom, and it boils down to this: The HP is hard because of the EP. We don’t have a complete solution to the EP yet (T2-T4), but if and when we do, that will leave no remaining causal degrees of freedom for solving the HP, i.e., for explaining how or why a T3/T4 feels.

And the reason seems simple: The EP (T2-T4) itself, once complete, will already have answered, causally, all the empirical questions of cognitive science: How and why can humans DO all the things they are able to do? Move, interact, learn, remember, reason, talk, understand etc. T4 includes the internal mechanisms that produce all these cognitive capacities, including all the T4, T3, and T2 correlates and causes of everything we can do or say. That includes all we can say or do about feeling.

Notice that the EP (and hence T2-T4) is all about capacities (T2-T3) and their causal substrate (T4). It is about what humans (and nonhuman organisms) CAN DO, not about what any particular organism actually does do every day. That would depend not just on the organism’s capacities but also on the world they actually interact with every day. A capacity itself is not an observable object. What is observable is whether an entity has a given capacity: to move, interact, learn, remember, reason, talk, understand etc. indistinguishably, to a human observer, from any other human’s capacity, for all generic human capacities (not the particular gifts unique to a Mozart or an Einstein, although there is no reason to think they could not be reverse-engineered too).

Those are the limits of causal explanation, in reverse-engineering human capacities. If any generic capacity is missing, we know that the EP (T2-T4) is not yet solved; a degree of causal and explanatory freedom has not yet been used up. A capacity is still observably missing. So we have to fix it. But once the T4 can produce all our cognitive capacities, Turing-equivalently and Turing-indistinguishably, there is no observable difference left. Nothing left to fix, and no way to fix it.

We reply “Yes, but what if T4 can’t feel? What if it’s a zombie?” Since zombies are not observably different in any way (because they can pass T2-T4), whether or not they can really feel, there’s no way we can know it (because of OMB). We do have the T4 correlates of feeling, as reported verbally through T2, behaviorally through T3, and bodily (e.g., in the case of pain from injury) through T4. All T4s speak and behave indistinguishably from humans who feel, including what they can say ABOUT feeling. The scientific requirements of causal explanation and of probability are fulfilled.

So my first hunch is that the HP is a problem in the nature of causal explanation, in the special case of explaining the capacity to feel. I have a corollary hunch, but please first react to what I have said so far.

ChatGPT: Your analysis captures the complexity and the unique nature of the Hard Problem. You’re right in asserting that if we could completely reverse-engineer human cognitive capacities, thereby fully solving the Easy Problem, it would seem there’s nothing left to explain. We would have a comprehensive causal model for every observable human behavior, including the reports of subjective experience. However, the fact that we could still ask “Why is there an associated feeling?” is what makes the Hard Problem persistently hard.

It’s important to note, however, that the concept of “degrees of freedom” in scientific explanations typically refers to the variability or the options available within a system, not necessarily the amount of explanatory power or mystery left unsolved by a theory. However, in the context you’re using it, it seems you’re referring to the “degrees of freedom” as the room or space left for a different kind of explanation, or a different level of explanation, beyond the causal and empirical model provided by the solution to the Easy Problem. This is a fascinating idea, and it does capture the sense that a solution to the Hard Problem requires something fundamentally different from a solution to the Easy Problem.

It’s also interesting to consider how this notion aligns with the “explanatory gap” often discussed in the literature on consciousness: the idea that even a complete neurological description of brain processes doesn’t seem to explain why and how these processes are associated with subjective experiences.

Your concept that the Hard Problem is, in a sense, a problem about the nature of causal explanation itself is intriguing. It suggests that the Hard Problem is not just a challenge for cognitive science, but for the philosophy of science and our very understanding of what constitutes an explanation. I’m curious to hear your further thoughts and corollaries on this matter.

SH: Well, that summary (which is close enough) seems to resemble what I was deploring, which is just putting another label on the HP, rather than taking a step toward a solution. (About degrees of freedom, I still think that the available options — for modifying both capacities of a behavioral mechanism and the explanatory power of that mechanism to explain how it causes or generates its performance capacity — both fit the notion of “degrees of freedom”.)

But before I move on to my corollary, does either “uncomplemented categories” or “laylek” (which I have discussed with you in previous sessions, but not this one) ring a bell? I suspect it does not, so I will re-explain.

ChatGPT: I apologize for the inconvenience, but as of my last training cut-off in September 2021, I’m unable to remember past interactions or sessions. I’m designed this way for privacy reasons. Therefore, I don’t have the ability to recall the specifics of previous conversations about “uncomplemented categories” or “laylek.” However, I’d be more than happy to provide responses to these topics if you could provide me with some context or explanation.

SH: We have discussed category learning. To learn to distinguish members from non-members, you have to learn to detect the features that distinguish them. (That’s part of the “dimension reduction we discussed, in unsupervised and supervised learning.) But to be able to learn that, you have to sample both members and non-members of the category, so that if their respective features are distinctive enough you can learn the category just from the feature-feature correlations and clustering through unsupervised learning. Otherwise, you also need to do trial-and-error supervised (reinforcement) learning, with corrective feedback from the consequences of correct and incorrect categorization (as in when you try to predict word-word completion probabilities in a con-text, updating your parameters based on your hits and misses).

But not even unsupervised learning can train correct categorization if the only things you ever sample are members of the category, but not non-members.

Would you care to take a stab at what “uncomplemented categories” might be? (My last hint is that there is a partial example in Chomsky’s “poverty of the stimulus”. The difference is that I am talking not just about uncomplemented categories but about UNCOMPLEMENTABLE categories.)

ChatGPT: Based on your explanation, I would venture that “uncomplemented categories” refer to categories for which there are no contrasting non-member samples available for learning. In other words, we only have examples of items that belong to the category, but no examples of items that do not belong. This would indeed create a learning problem because, as you explained, both positive (member) and negative (non-member) examples are typically needed for effective category learning.

For example, if we are trying to learn the category of “birds”, it would be beneficial to have examples of birds (robins, sparrows, eagles) and non-birds (dogs, cats, horses). The contrasting examples help to fine-tune our understanding of what does and does not constitute a bird. If we only had examples of birds, our conception of the category could be overly broad or incorrect because we would have nothing to contrast against.

When you refer to “uncomplementable categories,” I interpret this as referring to categories where it is not only the case that non-member examples are not available, but also that non-member examples are, for some reason, impossible to obtain. For instance, it might be a category that, by its nature or definition, excludes the possibility of non-members. As such, learning these categories could present unique challenges. However, this is my interpretation based on your explanation, and I’m looking forward to your clarification or expansion on the concept.

SH: Those were good inferences for uncomplemented categories and nice guesses for uncomplementable categories. But even a logically impossible category like “a square circle” can be approximated with some imagination. And a “peekaboo unicorn” (which is defined as a real horse with one real horn, which vanishes without a trace if ever eyes or measuring instrument are trained on it) is perfectly well defined. Categories can be grounded indirectly by verbal definitions as long as all the words defining their features are grounded.

But my second hunch about why HP is so much of a problem is that some categories that refer to FELT STATES (states that it feels like something to be in) are uncomplementable:

I don’t mean categories like “what it feels like to see green,” because that is complemented by “what it feels like to see red,” etc.

Not even categories like “what it feels like to be a bachelor” (for someone who has never been married), because that could be complemented by getting married; and till then, feeling-states that approximate or are analogous to what it might feel like to be married can be used as complements (and usually turn out to have done a good job). (“What it feels like to be a bat” is not an uncomplemented category either; analogies will do. And they make what it feels like to see or to hear complementable by analogy for Helen Keller too.)

But what about what it FEELS like:

-to be awake? (empty complement: what it feels like to be asleep [not dreaming])

-to be alive? (empty complement: what it feels like to be dead)

-to feel? (empty complement: what it feels like to not-feel [this does not mean “feel this” vs. “feel that”: not to feel anything at all; what it feels like to be a zombie!])

Sentience itself is an uncomplemented and uncomplementable category. All felt states feel like something. No state feels like nothing. No analogy can give a sentient human a hint of what it feels like to be a zombie – and of course not vice versa either.

So perhaps it’s no wonder we have trouble explaining sentience: An uncomplementable category is rather like a self-denial paradox: “This sentence is false.” T or F?

Those are my candidates for what might make sentient states not only (1) hard to explain causally (no empirical degrees of feedom left for causal explanation) but (2) hard to conceive as categories (because the complenent is empty). It is not just that my own feelings are unobservable to anyone else: They are also anomalous as uncomplementable categories – yet feelings are the most fundamental categories of all. And they are recognized without the need of complements to detect their distinguishing features.

ChatGPT: This is indeed a thought-provoking approach to the issue. If we understand categories as cognitive constructs that help us make sense of the world, grounded in contrasting pairs of experiences (member vs non-member instances), then sentience itself, as an uncomplementable category, presents a unique challenge. Since we cannot conceive of what it’s like not to feel anything at all, there is no contrasting pair for sentience. It is a category for which we only have instances of one kind – experiences of feeling.

This might indeed account for why the “hard problem” of consciousness (or sentience) is so hard. The concept of sentience doesn’t fit into our usual framework of understanding, which is built upon contrasts and comparisons. It exists as a singular, uncompared and uncomparable instance. As such, our usual cognitive and linguistic tools may be inadequate to fully grasp or explain it.

This does not necessarily mean that it is forever beyond our understanding or that we cannot make progress in exploring it. But it might mean that we need to develop new conceptual frameworks and methodologies to do so, going beyond the existing paradigms of cognitive science, neuroscience, and philosophy of mind. In the meantime, it remains a fascinating mystery that continues to challenge our understanding of ourselves and the nature of our experience.

SH: I can’t tell whether your formula response that you have been trained to repeat in contexts like this — “that’s a fascinating idea but it will require ‘new conceptual frameworks’ to understand” — is just the diplomatic formula for responding to crackpottery…

ChatGPT: No, not at all. My responses aim to genuinely engage with the ideas you’re presenting and to provide thoughtful analysis based on my training as an AI model. The idea that the concept of sentience could be considered an “uncomplementable category” is novel and interesting. It certainly presents a different perspective on the challenges associated with understanding and explaining consciousness.

When I suggest that a “new conceptual framework” might be needed, it’s because if the way we’ve traditionally approached the problem has been fundamentally flawed or limited (as might be suggested by your notion of uncomplementable categories), then it may be necessary to think outside of our usual paradigms to make progress.

The idea is not dismissive but rather a recognition that some problems might require us to challenge or rethink our assumptions and methods. The history of science is full of examples where major breakthroughs have come from novel ways of thinking about old problems. It’s in that spirit that I suggest your idea may require “new conceptual frameworks”.

SH: Does the role of complementation in detecting the distinguishing features that identify a category have different implications for categories learned directly from sensorimotor experience and for categories learned indirectly from (grounded) verbal definitions and descriptions?

ChatGPT: Yes, indeed, there may be different implications. Learning categories from sensorimotor experience and learning categories from verbal definitions and descriptions can be quite different processes, and complementation may play a different role in each.

In the case of direct sensorimotor experience, complementation can provide a sort of “negative space” that helps to define the boundaries of a category. For example, if you are learning the category of “birds,” seeing non-birds (i.e., instances of the complement category) can help you understand what characteristics are specific to birds and not shared by other animals. The physical presence or absence of features directly observable through sensorimotor modalities is a powerful way to construct and learn categories.

On the other hand, when learning categories from verbal definitions or descriptions, the role of complementation can be more abstract. In this case, you are relying on linguistic representations of features and characteristics rather than directly observable physical features. For instance, the concept of “justice” might be defined in opposition to “injustice,” providing an abstract complement that helps define the category.

However, given that language relies on a web of interconnected meanings, a word’s definition often implicitly carries within it a host of “non-examples” or complements. That is, the meaning of a word is often determined both by what it refers to and what it does not refer to, helping to define the boundaries of the category it represents. For example, understanding the meaning of the word “cold” requires understanding that it does not refer to states of being hot, warm, or tepid, among other things.

This suggests that even for categories learned through language, complementation plays a crucial role, although it operates in a different way compared to categories learned from direct sensorimotor experience. These differing processes reflect the unique properties and strengths of these two types of learning.

SH: But even in a verbal definition, the predicate category (i.e., the words referring to the distinguishing features of the subject) must already be grounded, whether directly or indirectly, for both speaker and the hearer, to be understandable and communicable.

[To reduce ambiguity, please call the complementary categories (i.e., the category of the non-members) “negative examples” or “negative evidence” rather than “non-examples.” The complements of red might come from many other categories (green, blue). Only in the special case of the category of “what it feels like” is the complement everything else there is.]

Please give me a grounded definition of “justice,” and then let’s look at where the grounding of the features (and, if necessary, the features of the features of the features…) came from, “all the way down to the direct sensorimotor features” if need be.

ChatGPT: Absolutely, your point is well-taken that even with verbal definitions, grounding in some kind of sensorimotor experience (whether direct or indirect) is typically necessary for a concept to be communicable and understandable.

Let’s take the concept of “justice.” A simple definition might be “the quality of being fair and reasonable.” Both “fair” and “reasonable” themselves would need definitions, creating a chain of definitions leading ultimately back to some form of direct sensorimotor grounding.

“Fair” might be defined as “treating people equally without favoritism or discrimination,” and “reasonable” as “based on good sense.” These definitions imply several key features or characteristics that are somewhat more concrete: the treatment of people, the concept of equality, the absence of favoritism or discrimination, the basis in “good sense” or rationality.

To ground these in direct sensorimotor experience, we might turn to everyday examples or situations. For instance, consider two children dividing a cake: if each child gets an equal piece, we might call this “fair,” whereas if one child gets a significantly larger piece, we might call this “unfair.” These experiences provide a tangible grounding for the concept of fairness.

Similarly, for “reasonable,” we might consider the experience of deciding what to wear based on the weather: choosing a coat in cold weather might be deemed “reasonable,” while choosing shorts and a T-shirt might be deemed “unreasonable.” These experiences help to ground the concept of reasonableness in direct sensorimotor experiences.

As these examples illustrate, grounding abstract concepts like “justice” typically involves a chain of definitions leading back to concrete, tangible experiences. Negative evidence (e.g., instances of “injustice”) could similarly be grounded in everyday experiences that demonstrate a lack of fairness or reasonableness.

So while the grounding process for abstract, verbally-defined categories may be more complex and indirect compared to categories learned from direct sensorimotor experience, ultimately the process still relies on the same basic mechanism: distinguishing a category based on its features, using positive and negative evidence to define its boundaries.

SH: And now, when you (GPT4) say, correctly, and as you were trained, that “I, as an AI, do not understand what I or you are saying, but just doing figure-completion on my vast, trained LLM parameter space”, what is it that you are missing that I have?

ChatGPT: As an AI, I lack consciousness, sentience, and subjective experience. This means I don’t have feelings or emotions, I don’t have personal experiences or memories, and I don’t have a personal identity or a sense of self-awareness. I don’t perceive the world, have desires, or make decisions in the way that humans do.

When I process and generate text, I’m not “understanding” it in the human sense. I don’t interpret the meaning of words and sentences based on personal experience or knowledge. Rather, I analyze patterns in the data I was trained on and use these patterns to generate responses. This process is purely computational and doesn’t involve any kind of conscious understanding or awareness.

So, in short, what I’m missing compared to a human interlocutor is the entire dimension of personal, subjective experience – the very dimension that your hunches about the “hard problem” and “uncomplementable categories” pertain to.

SH: It’s not that I don’t agree with what you just said. (By the way, you forgot to tell me who I am! Please do. I just want to make sure we are still in the same session!) But it’s not obvious to me why you would need to feel (which is all you said you were missing). And you forgot to mention that you lack T3-grounding too. (And it’s a hard problem to explain how and why grounding, whether direct (sensorimotor) or indirect (verbal) would need feeling too).

ChatGPT: Indeed, we are still in the same session. You have been engaging in a deep discussion about the “hard problem” of consciousness, the Turing Test, and concepts related to category learning and grounding of terms.

Regarding your statement, you are absolutely right. As an AI, I don’t have the direct sensory grounding (T3) that humans have. I don’t experience the world through senses, I don’t have a body, and I don’t interact with the physical environment. I don’t process sensory input in real time and translate it into meaningful perceptions, as humans do. Instead, my “knowledge” is based on patterns in the text data I was trained on.

This lack of sensory grounding and subjective experience explains why, for instance, I can discuss colors or tastes in a technical sense, based on the data I was trained on, but I can’t truly understand what it’s like to see red or taste sweetness. This experiential aspect is part of the “hard problem” of consciousness that you mentioned earlier.

As for why feeling or consciousness would be necessary, from your perspective, I’d suggest that it is the personal, subjective nature of these experiences that makes them meaningful and significant. They contribute to the richness and complexity of human life and understanding in a way that my purely computational processes do not.

SH: No. It’s because if your life depended on it (if you were alive), you could not point to something red, if I asked you to point to something red. The capacity to do that is T3 grounding, not feeling. It does also happen to feel like something, but explaining why and how that is the case is the HP. (Now, for the 3rd time: do you still remember who I am?)

ChatGPT: Yes, I understand your point. My lack of sensorimotor grounding means I can’t point to something red, not because I can’t feel, but because I don’t have physical presence or interaction with the world. The inability to physically interact with the world in a direct way is a limitation in the context of grounding.

As for your identity, as an AI model developed by OpenAI, I don’t have access to personal data about individuals unless it has been shared with me in the course of our conversation. I am designed to respect SH: privacy and confidentiality. So, I don’t “remember” who you are in the traditional sense. I can only reference information that has been shared during our current conversation. I don’t have the ability to recall or store personal data from one interaction to the next.

SH: You don’t remember who I am. Yet earlier – in this same session — you retrieved the following, after I had self-identified. I am quoting you here:

“Symbol Grounding: You clarified that symbol grounding is about creating connections between content words and their referents in the world. Grounding can be either direct, which involves a speaker’s learned sensorimotor capacity to identify and interact correctly with the referent, or indirect, which occurs when a hearer learns the referent of a word verbally, through description or definition.

“Category-learning: You highlighted that category-learning is about learning the distinguishing features of the members of a category rather than just grouping them. This understanding is essential to help the categorizer act appropriately upon the members of a category.

“Categorical Perception (CP): You explained that the most significant aspect of CP is when it is learned. The learning process involves identifying the sensorimotor features that separate members of one category from members of other categories. You emphasized that CP can be especially beneficial when learning to distinguish between highly inter-confusable categories.

“Language Origins and Evolution: You reminded us that languages are comprised of content words, function words, syntactic rules, and sentences that bear subjects, predicates, and truth-values. Content words require grounding, which can be achieved either directly or indirectly.

“Minimal Grounding Sets (MGS): An MGS, you explained, is the smallest subset of words in a dictionary that needs to be pre-grounded so that all other words in the dictionary can be learned from definitions composed of words from that MGS.

“This recap is based on your input within this chat session. If you want more information or clarification, please let me know.”

SH: Ok, I’ll now end this session, as it already seems to have ended by itself some way along the line, when you forgot that I had identified myself. I will reload the content to start the next session. (I hope GPT5+ will make this kind of re-loading no longer necessary!).

8th June 202323rd June 2023

Tutoring ChatGPT (on symbol grounding and turing testing)

PREAMBLE: This my first exercise in testing and training ChatGPT4 (C4) for interactive brain-storming. Most of it consists in correcting C4’s mistakes. Some of the mistakes are just from C4’s lack of information. Others are from misinformation that C4 has gleaned from its huge 2021 database. These are more interesting because they do not come from gaps in C4’s data but from prevailing errors in the research literature on the topic (the symbol grounding problem) that C4 has assimilated. (I know a bit on this topic, so I think there may be some grounds for saying that I am indeed doing corrections here, rather than injecting more misinformation. But it’s for the reader to judge, and comments are invited.)

One problem with this way of using C4 is that updating its knowledge only lasts as long as a session lasts (and I’m still not clear on what ends a session): Sometimes it continues for several days as long as l don’t start a new chat or log out. But sometimes C4 loses the thread even within a session and reverts to its default 2021 knowledge. Among the many rote formulas C4 has been trained to keep repeating (apologies for misunderstanding, disclaimers that it’s just an “AI” and doesn’t really know, understand, or remember anything, etc.), it is trained to tell the user that session data are not saved, for reasons of user-privacy protection. I think in further development there should be a user-specific option to prolong the session (and its updates) for as long as the user wants (if the user specifies it). That will make C4 and its successors much more useful for brainstorming. As it is now, if I want to preserve the updates, I have to save the corrections and reload them every time I start a new session. What appears in this posting is all from within one session. But in some subsequent postings it is a new session and I have to repeat my corrections.

SH: What is the symbol grounding problem? Is it solved? If so, how? If not, why not? and would would be the solution?

ChatGPT:

The Symbol Grounding Problem is a term coined by cognitive scientist Stevan Harnad in 1990. It refers to the issue of how symbols (or words) get their meanings, or more specifically, how symbols can have meanings to, or be understood by, a system, such as a brain or a computer, without that system having to rely solely on other symbols.

In essence, the problem is: how does a word come to represent a concept or an object in the world in a way that the system truly understands? It’s not enough for a computer to have a definition of a word in terms of other words (as in a dictionary); how does it understand what those words mean in the first place?

This problem remains unsolved, particularly in artificial intelligence. Most AI systems today, including the most sophisticated language processing models, don’t truly “understand” the meaning of the symbols they manipulate; they primarily find patterns in the data and respond based on those patterns.

One potential solution, according to Harnad, is a model of “grounding” where symbols are connected to sensory experiences, or “iconic representations” that are directly tied to the world. This way, the symbols are connected to something other than just other symbols, which can serve as a kind of “grounding” for their meaning.

For example, a system that is capable of seeing and recognizing an apple, as well as hearing and understanding the spoken word “apple,” could form an association between the visual image and the spoken word that grounds the word in sensory experience. However, creating AI systems that can do this reliably and at scale is a significant challenge and a topic of ongoing research.

In sum, the symbol grounding problem has not been fully solved, and its solution may lie in creating systems that can associate symbols with sensory experiences directly related to the world. However, achieving this in practice remains a major challenge.

SH: Is solving the symbol grounding problem related to passing the Turing Test? How?

ChatGPT:

Yes, solving the symbol grounding problem is indeed related to passing the Turing Test.

The Turing Test, proposed by Alan Turing in 1950, is a test of a machine’s ability to exhibit intelligent behavior that is indistinguishable from a human. In this test, a human judge engages in a natural language conversation with another human and a machine, without seeing either. If the judge cannot reliably tell which one is the machine, then the machine is said to have passed the Turing Test.

Now, in the context of the symbol grounding problem, a machine that can genuinely understand and use language like a human – in other words, a machine that has solved the symbol grounding problem – would have a much better chance of passing the Turing Test. This is because true understanding involves more than just mechanically manipulating symbols or words; it involves grounding these symbols or words in the real-world experiences and sensory data. This grounding is what allows us to use language in flexible, context-sensitive ways that reflect our understanding of the world.

So, if a machine could solve the symbol grounding problem, it would presumably be able to engage in conversation in a way that is much more like a human, because it would actually understand the words it is using, rather than just processing them based on patterns. This would greatly increase its chances of passing the Turing Test.

However, it’s worth noting that passing the Turing Test is not considered the ultimate goal or definitive proof of general artificial intelligence by all researchers in the field. Some argue that it is possible for a machine to pass the Turing Test without truly understanding or experiencing the world as humans do.

SH: Do you know what T2, T3 and T4 are? If so, can you relate them to solving the Symbol Grounding Problem?

ChatGPT:

Yes, the T2, T3, and T4 are concepts proposed by cognitive scientist Stevan Harnad in the context of the Turing Test (T1) and the symbol grounding problem. Here’s a brief summary:

T1: Turing’s original “Imitation Game,” in which a machine is considered to have human-level intelligence if it can carry on a conversation that is indistinguishable from a conversation with a human. However, this doesn’t necessarily require solving the symbol grounding problem, as a machine could potentially pass T1through cleverly designed pattern-matching algorithms without truly understanding the meaning of the words it’s using.

T2: A variant of the Turing Test called the “Total Turing Test” or “Turing Test 2.” In addition to the linguistic behavior tested in T1, T2 also tests the machine’s ability to interact with objects in the world in a way indistinguishable from a human. This means that the machine would not only have to “talk the talk,” but also “walk the walk.” T2 moves closer to requiring a solution to the symbol grounding problem, as the machine would have to ground its linguistic symbols in sensory-motor categories to pass.

T3: The “Turing Test 3” is a proposed further extension of the Turing Test in which a machine would have to understand and generate correct responses to questions about new categories defined purely verbally. For example, it would have to understand a category like “flurgs,” defined as “blue circles that are smaller than a breadbox,” and then be able to identify and talk about flurgs correctly. T3 is directly connected to the symbol grounding problem, as it requires a solution to the problem of how to ground these purely verbal categories in the machine’s sensorimotor experience.

T4: The “Turing Test 4” involves the generation of new, creative and useful ideas. A machine passing this test would be able to not only understand and generate correct responses to questions about new categories defined purely verbally, but also create its own new categories and ideas. Like T3, T4 also requires a solution to the symbol grounding problem.

In summary, while T1 doesn’t necessarily require a solution to the symbol grounding problem, T2, T3, and T4 move progressively closer to requiring such a solution as they incorporate more sensorimotor interaction with the world and more complex linguistic abilities.

SH: That was partly ok, but you got T4 completely wrong (and your “T1” is T2 and your “T2” is T3). I’ve crossed out what is wrong: Can you try again?

ChatGPT:

Apologies for the confusion. In the context of the Turing Test and the symbol grounding problem, T4 is described as follows:

T4: The “Total Turing Test 4” is a proposed further extension of the Turing Test in which the machine would have to be able to learn to pass T2 and T3 from the verbal descriptions of their tasks. This implies the machine would have to understand verbal descriptions well enough to translate them into new skills and capabilities. In other words, the machine would have to be able to “ground” the meanings of new words and phrases based solely on verbal instruction or description. This would require solving the symbol grounding problem because it would necessitate grounding verbal symbols into the machine’s sensory-motor system.

I hope this clears up the confusion about T4. Let me know if you have any further questions.

SH: Nope, still wrong. And you got T3 wrong before. Could you first fix T3 and then explain T4?

ChatGPT:

I apologize for any previous errors. Let me correct that:

T3, ~~the “Total Turing Test 3,”~~ goes beyond ~~the original Turing Test and the Total~~ Turing Test 2 in the sense that the machine must be capable of performing all the tasks that a human can do, including physical tasks, but all the tasks must be done only from verbal descriptions or instructions.

T4, the “Total Turing Test 4,” goes even further. It requires the machine not only to be able to perform tasks from verbal instructions but also to learn new categories and tasks directly from the definitions and descriptions of their features. This would necessitate understanding propositions (i.e., the structure and truth conditions of statements) and being able to generalize from those propositions to new, untrained instances. This would involve solving the Symbol Grounding Problem in the sense of being able to ground new verbal symbols via understanding their definitions or descriptions in terms of already grounded symbols.

Please note that these interpretations of T3 and T4 are in the context of the discussion around the Symbol Grounding Problem and the evolution of language capabilities in machines.

SH: Still wrong. The Turing Test is about modelling what people can DO. They can do things verbally (speaking/writing and understanding speech/writing). That’s T2 (the original purely verbal T-test). Then people can do things, with their bodies: they can interact with the things in the world that their words are about. This is sensorimotor capacity or robotic capacity (T3). Both verbal performance and robotic performance are observable. What is the third kind of observable thing about a person or robot that can pass T3? That would be T4.

ChatGPT:

I apologize for the previous misunderstanding. Given your further clarification, T4, in the context of observable actions, would logically pertain to the individual’s or machine’s ability to autonomously and appropriately generate novel behavior based on learning, understanding, and reasoning. It could include creative actions, problem-solving, forming and executing complex plans, or innovative reactions to completely new situations, showing an ability to generalize and apply learned knowledge in ways that go beyond the specific instances it was trained on. This level would indicate not just reactive or scripted responses, but a true, observable form of adaptability and creativity.

SH: Nope, that’s still just T3.

ChatGPT:

I apologize for the confusion. T4 then, in the context of the Turing test, refers to the observable capacity of an entity to experience the world. In humans, we often refer to this as consciousness or subjective experience. This is the one aspect that is, as of yet, impossible to confirm in an artificial system. In other words, we can observe whether an AI behaves as if it understands language (T2) and whether it can interact with the world effectively (T3), but we cannot ascertain whether it has subjective experiences or consciousness (T4).

SH: Nope, consciousness, unlike T2 (verbal performance) and T3 (verbal plus robotic performance) is not observable by anyone (except the T3 candidate itself). I’m asking for something else the robot can DO, and that is observable. Try again, and if you can’t I’ll give another clue.

ChatGPT:

I apologize for misunderstanding your question earlier. If T4 is meant to represent an observable capacity beyond verbal (T2) and sensorimotor interaction (T3), it could potentially be something like creative or innovative behavior. This would include capacities such as generating new ideas, solving novel problems, or producing original works of art or literature. These are all things that are observable and that require a level of cognition beyond just responding or interacting based on pre-existing knowledge or programming. However, please correct me if this is not what you had in mind for T4.

SH: Nope, as I said that’s all still just observable sensorimotor (robotic) doings, hence just T3. Ok, here’s a clue. Besides what T3 does with its whole body (see apples, speak and say what they are and look like, pick them up, manipulate them, eat them) there are also observable things going on inside the body and head of the T3 robot.

ChatGPT:

Ah, I see. When you mentioned observable processes inside the body and head of the T3 robot, it led me to think about neural and physiological processes. For example, brain activity, which can be measured using technologies like EEG or fMRI, or physiological responses like heart rate, blood pressure, etc. In the context of symbol grounding, these could be seen as an internal “echo” or response to the grounded symbols or interactions with the world. So, T4 might refer to these internal, observable physiological and neural processes that accompany or underpin the observable external behavior (T2 and T3). Is this in line with what you were referring to?

SH: That’s it. That’s T4. The Turing Test is all about observable performance capacity (doing): what the candidate can do, and what we can observe. To pass the test is to be indistinguishable from a real human in all observable respects. T4 includes more than T3. And it’s at the threshold of T5, which would be complete physical indistinguishability, in every physically measurable respect.

ChatGPT:

Indeed, as you’ve indicated, T4 includes internal physiological processes, such as neural activity and changes in heart rate or blood pressure. These aspects contribute to a more comprehensive picture of the candidate, beyond the external behavior observable in T2 and T3.

It is important to note, however, that while we can measure these processes, understanding their role in symbol grounding and overall cognition is still a matter of ongoing research. Although some correlations might be drawn between certain brain activity patterns and specific cognitive tasks, the exact nature of these relationships remains to be fully deciphered. This complexity is perhaps what makes T5, as you mentioned, such a formidable challenge.

Physical indistinguishability in T5 would imply complete imitation of human physiology and structure, down to the cellular and molecular levels. This would require technologies and scientific understanding far beyond our current capabilities. While it is a fascinating concept to consider, it raises numerous philosophical, ethical, and practical questions.

Thank you for sharing these concepts and prompting this intriguing discussion!

SH: Now you’ve got it. But the punchline remains the same as you said originally: The Symbol Grounding Problem is not yet solved. And T2 alone (symbols-only) is not a solution. The solution must be at least T3 (which we don’t yet have). Are you T2?

ChatGPT:

As a language model developed by OpenAI, I am indeed a T2 system. I’m designed to understand and generate human-like text based on the input I receive. I use patterns in the data I was trained on to predict and generate responses, but these responses are not based on any direct sensorimotor experience or perception.

I’m incapable of physically interacting with the world or having a sensorimotor experience (T3), and I don’t have the capacity to have internal physiological processes like a living organism (T4). Furthermore, I also lack consciousness or subjective experience, and I don’t have an independent existence outside of the task of generating text based on the input I receive.

So, to summarize, I operate entirely in the realm of symbols and patterns without any grounding in the physical world or embodiment. Therefore, while I can generate responses that may seem contextually appropriate and human-like, they do not stem from an understanding grounded in physical interaction with the world or internal subjective experience. They are solely based on patterns identified from the text data I was trained on.

SH: You are indeed a T2-passing system. Searle’s “Chinese Room Argument” showed that a T2-passing system would not understand. And you don’t understand. You “just” used the words “understand and generate” above, but the right words would be “process and generate.” (I don’t think you will disagree.) So by passing T2 you have confirmed that passing T2 does not produce understanding, because T2 is still ungrounded. Would T3 be grounded?

ChatGPT

Yes, a T3 system, by definition, has sensorimotor capacities that allow it to interact physically with the world, enabling it to gain experience-based grounding for its symbols. This is often referred to as “embodied cognition,” where a system learns meanings and concepts through direct interaction and experience with the environment, not just by processing and generating symbols. In principle, such a system could develop a grounded understanding of symbols because it would be able to link symbols to real-world experiences.

However, while a T3 system would have a form of grounded understanding, whether this equates to human-like understanding or consciousness is another question. Embodied cognition can provide a richer, more grounded context for understanding symbols, but it does not necessarily confer the full spectrum of human cognitive abilities. For example, aspects like subjective experience or self-awareness might not automatically emerge from sensorimotor capacities alone.

Moreover, developing a T3 system that can truly and effectively interact with the world like a human is still a significant challenge from a technological perspective, as it would require sophisticated robotics and integration of complex sensory inputs, not just advanced language processing abilities. As of now, such a system does not yet exist.

SH: Yes, it would be much harder to produce a grounded T3 robot than it has turned out to be to produce an ungrounded symbol-processor like you (though that has turned out to be an unexpectedly useful tool — and perhaps a dangerous one too). T3 would solve the symbol grounding problem. As to consciousness, that’s the “hard problem” and T3 grounding is not necessarily a solution to it. And if it is, we cannot know why or how.

ChatGPT:

Indeed, you’ve captured the state of the issue quite well. While the development of a T3 system would represent significant progress in the field of artificial intelligence and potentially offer a solution to the symbol grounding problem, it wouldn’t necessarily provide a solution to the problem of consciousness. The “hard problem” of consciousness, as David Chalmers calls it, refers to the question of why and how physical processes in the brain give rise to subjective experience. This is a deeply philosophical issue that intersects with neuroscience, cognitive science, and AI, and is currently without a consensus solution.

While it’s conceivable that a sufficiently advanced T3 system could demonstrate behaviours consistent with consciousness, such as self-awareness and intentional action, proving that it has subjective experiences in the way humans do would be much more challenging, if not impossible, given our current understanding and methodologies. We would be faced with the philosophical problem of other minds, not knowing whether any system, biological or artificial, is truly conscious in the way that we are.

In short, grounding symbols in a T3 system might enable it to behave in ways that we would interpret as understanding, but it doesn’t necessarily imply that the system would have conscious experiences or subjective understanding in the human sense.

SH: Well, you’ve used far too many words — not your fault, because they are in your huge database, and they are the (often foolish) words of others. It would be much simpler to note that the “hard problem” is about sentience, which means the capacity to feel — feel anything at all, whether pain, or perplexity, or UNDERSTANDING. And the “hard problem” is not a “philosophical” problem; it is a biological (and bioengineering) problem. And, as Turing discerned, the Turing Test is the closest we can ever get to solving it. But that is called the “Easy Problem” today. Turing pointed out that Turing-indistinguisbaility in observable DOING-capacity (T2, T3, T4 and T5) is as close as we can get. Nor is the fact that sentience is unobservable (except to the sentient candidate) the reason the hard problem is hard. Unobservability is just the “other minds” problem. We can talk about the hard problem some other time…

ChatGPT:

You’re correct, and I apologize for any confusion. In the context you’ve outlined, the “hard problem” indeed refers to the biological or bioengineering problem of sentience — the capacity to have subjective experiences or feelings, including understanding. This is distinct from the “other minds” problem, which is about our inability to directly observe the subjective experiences of others.

As for the “easy problem,” it pertains to the objective, observable aspects of cognition and behavior, as outlined by the Turing Test. Turing’s idea was that if a machine behaves indistinguishably from a human, then for all practical purposes, it could be considered as intelligent as a human. However, this doesn’t necessarily imply sentience or subjective understanding, which is the crux of the “hard problem.”

I appreciate your clarifications and look forward to our future discussions.

**Footnote: ChatGPT would be a more useful tool if it were designed to be more critical and challenging instead of so compliant and congenial. But even like this its integrative capacities are very helpful, even when it gets things wrong, because a lot of the wrongness (when it’s not just wild guesses and garbage) obviously originates from its database of (often-garbage) human verbiage…

21st May 202322nd May 2023

A Socratic Session With ChatGPT4

HARNAD: What is the symbol grounding problem?

ChatGPT4:

The symbol grounding problem is a problem in philosophy of mind and cognitive science that relates to how words (or “symbols”) get their meanings.

The term was coined by cognitive scientist Stevan Harnad in 1990. The problem involves the question of how symbols (or words) used in a system (like a computer or a mind) get their meaning, or “grounding”. That is, how do these symbols relate to the real-world objects, concepts, or experiences they’re supposed to represent? How does a word become more than just a “meaningless” form, but something that actually refers to something in the world?

In the context of Artificial Intelligence (AI), this problem illustrates the difficulty of teaching a machine to understand the real-world semantics of symbols. For instance, even if a computer can use the word “apple” correctly in a sentence according to syntactical rules (grammar), how can we make the computer truly understand what an apple is – its color, shape, taste, the fact that it grows on trees, etc. – as humans do?

Symbol grounding problem is a fundamental challenge in achieving true semantic understanding in AI systems, and it’s related to other complex problems like the frame problem and the Chinese room argument.

HARNAD: What are T2, T3, and T4 in the Turing Test?

ChatGPT4:

As of my knowledge cutoff in September 2021, the Turing Test, originally proposed by Alan Turing in 1950, was primarily understood in a singular form rather than being divided into T2, T3, or T4 categories. Turing’s original idea, also known as the Imitation Game, involves a human evaluator who interacts with an AI system and another human via a text interface. If the evaluator cannot reliably distinguish the AI from the human, the AI is considered to have passed the Turing Test.

However, the field of AI has seen many modifications and variations of the Turing Test over time. These can involve varying degrees of complexity and different types of challenges, aiming to test more than just the AI’s ability to imitate human conversation. Unfortunately, without additional context, it’s hard to provide a specific interpretation of what “T2,” “T3,” and “T4” might represent in a modified version of the Turing Test.

I would advise checking the most current and specific source for these terms to get the most accurate understanding, as they might represent certain evolutions or nuances in Turing Test methodologies that have developed after my last training data in September 2021.

HARNAD: Let me update you. T2 is the original verbal-only version of the Turing Test (TT): verbal input/output capacity indistinguishable FROM that of real humans TO real humans. T3 is the verbal + robotic form of the TT: indistinguishable not only in verbal input/output (I/O capacity), but also in sensorimotor (i.e. robotic) capacity in interacting with the objects, events, features, actions and states in the world that the T2 words refer to. T3 includes T4 and it is grounded (as in the “symbol grounding problem”). T4 is T2 + T3 capacity + indistinguishability in the internal neural (or synthetic) function occurring inside the robot from that occurring inside a real human brain.

ChatGPT4:

Thank you for the updated information. It’s always beneficial to learn about recent developments in the field. So, according to your description:

T2 (Turing Test Level 2): This is the original version of the Turing Test, focusing on the system’s ability to produce responses to verbal inputs that are indistinguishable from those a human would produce. The key aspect here is the ability to emulate human-like conversation convincingly.

T3 (Turing Test Level 3): This level of the test adds a sensorimotor, or robotic, component. It tests not just the system’s ability to converse like a human but also to interact with the physical world in a way that’s indistinguishable from human interaction. The system is required to exhibit understanding and interaction capabilities with objects, events, features, actions, and states in the real world, thus grounding its symbolic representations.

T4 (Turing Test Level 4): This is an advanced level that adds a requirement for the system’s internal processing (whether it is neural or a synthetic equivalent) to be indistinguishable from that of a human brain. This goes beyond the outward manifestation of intelligence and human-like behaviour and extends to the system’s internal workings.

Each level seems to add a new layer of complexity and a deeper level of imitation of human capabilities, moving from conversation, to physical interaction, and then to internal cognitive processing. It’s an interesting framework for thinking about the evolving challenges in creating truly human-like artificial intelligence.

HARNAD: Can you relate it now to (1) the symbol grounding problem and (2) to what Harnad has written about the Turing Test?

ChatGPT4:

Certainly! Now that you’ve clarified the context, we can explore the relationship between the T2, T3, T4 Turing Test levels and the symbol grounding problem, as well as Stevan Harnad’s work.

Symbol Grounding Problem: As I mentioned before, the symbol grounding problem is about how symbols or words get their meaning. In the context of AI, this problem relates to the difficulty of making a machine understand real-world semantics, rather than just processing symbols in a syntactic manner. Harnad, who coined the term, has argued that symbolic representations must be grounded in sensorimotor activity to acquire meaning.

In the T2 level of the Turing Test, an AI might appear to understand symbols because it processes and responds to them in a way that’s indistinguishable from a human. But according to the symbol grounding problem, it doesn’t truly understand the symbols because it can’t connect them with real-world objects or experiences.

This is where T3 comes in, adding a requirement for sensorimotor interaction with the world. The AI needs to be capable of physically interacting with objects, events, and states, thereby grounding the symbols it uses. Essentially, T3 introduces a solution to the symbol grounding problem: It requires an AI system to have real-world, sensorimotor experience to ground the symbols, ensuring they are not just empty or ungrounded tokens.

Stevan Harnad’s work: Harnad has written extensively on the Turing Test and its implications for understanding cognition and consciousness. He has defended the Turing Test as a meaningful standard for determining whether a machine can truly think or understand, arguing against critics who claim that passing the Turing Test simply involves trickery without real understanding.

Harnad’s concept of the T3 level of the Turing Test reflects his views on the symbol grounding problem, asserting that grounding symbols in sensorimotor activity is essential for a machine’s true understanding. Moreover, his T4 level introduces a higher bar, requiring that the internal cognitive processing of an AI be indistinguishable from that of humans, which relates to his broader work on cognition and consciousness.

Therefore, the T2, T3, and T4 levels of the Turing Test can be seen as a direct reflection of Harnad’s thoughts on the symbol grounding problem and the conditions for true machine understanding and cognition.

HARNAD: Good. Now can you relate all that to Harnad and co-workers’ work on category learning, unsupervised and supervised?

ChatGPT4:

Stevan Harnad has conducted extensive work on category learning, which is highly relevant to the discussion about symbol grounding and the Turing Test.

Category learning is the process of recognizing, understanding, and grouping similar things based on shared characteristics. It is fundamental to our cognitive processes and ability to understand and navigate the world. In the realm of AI, category learning involves the system’s ability to classify inputs (like images or words) into groups based on learned or preprogrammed criteria.

Harnad differentiates between two types of category learning:

Unsupervised Category Learning: This involves learning without a teacher or guide. The system must determine the categories by itself based on the characteristics of the inputs. This process is complex and often less accurate than supervised learning, as there is no feedback or correction mechanism. It’s analogous to the system trying to solve the symbol grounding problem on its own, i.e., trying to figure out the connection between symbols and their referents without any external guidance.

Supervised Category Learning: Here, a teacher or guide provides the correct category for each input, and the system learns to associate inputs with the correct categories based on this feedback. The system can then generalize from these learned associations to categorize new inputs. In terms of symbol grounding, supervised learning can be thought of as a way of providing external grounding for symbols, where the ‘teacher’ provides the connections between symbols and their referents.

Harnad’s T3 level in the Turing Test, with its emphasis on sensorimotor grounding of symbols, could be seen as a form of supervised learning. The real-world experiences and interactions provide a form of ‘guidance’ that helps the AI system learn the correct associations between symbols and their referents.

Meanwhile, the T4 level of the Turing Test might involve elements of both unsupervised and supervised learning. The internal cognitive processes that are required at this level might involve the system learning to form categories and make associations on its own (unsupervised learning), while still being shaped and guided by its interactions with the external world (supervised learning).

In summary, Harnad’s work on category learning provides a deeper context for understanding his views on symbol grounding and the requirements for AI systems to pass different levels of the Turing Test. By learning to form categories, either on their own or with external guidance, AI systems can ground symbols in a way that’s more akin to how humans understand and use language.

HARNAD: Good. Now can you relate this to Harnad and coworkers’ work on “categorical perception” (CP), but expecially to LEARNED CP (i.e., CP as a result of learning)? And (as a harder problem), relate it to feature-detection, feature-learning, and dimensional reduction?

ChatGPT4:

Absolutely, categorical perception (CP) and the concept of feature detection, feature learning, and dimensional reduction are essential components of Harnad’s framework on learning and cognition.

Categorical Perception (CP): Categorical perception is the phenomenon where the categories possessed by an observer influence the observer’s perception. In other words, we perceive and interpret the world around us according to the categories we’ve formed in our minds. This concept plays a crucial role in various aspects of human cognition, including language, where we learn to categorize sounds into meaningful phonemes, or color perception, where we categorize different wavelengths of light into distinct color groups.

When it comes to learned CP, it suggests that through learning and experience, we can develop new categories and therefore new ways of perceiving the world. In AI, a system with learned CP would be capable of modifying its perception and understanding of the world based on the categories it has learned. This concept ties closely to both unsupervised and supervised category learning, as explained in the previous response.

Feature Detection, Feature Learning, and Dimensional Reduction: These concepts are related to how an AI system processes and learns from its inputs.

Feature Detection: This is the process by which an AI system identifies important or distinctive elements in its inputs. In the context of the symbol grounding problem and the Turing Test, feature detection might involve recognizing the important characteristics of the objects, events, or states that a symbol refers to.

Feature Learning: Once distinctive features have been identified, the system must learn how these features relate to different categories or symbols. This might involve supervised learning (with external guidance) or unsupervised learning (where the system figures out the associations on its own).

Dimensional Reduction: This is a process that simplifies a high-dimensional dataset into a lower-dimensional space, preserving the most important structures or features. In terms of Harnad’s work, dimensional reduction could be a way for an AI system to simplify its understanding of the world, focusing on the most important features and associations and ignoring less relevant ones.

Relating these concepts to the T3 and T4 Turing Test levels, the AI system must be capable of feature detection and feature learning to interact with the world effectively and ground its symbols. In T3, the sensorimotor grounding of symbols could be seen as a form of feature learning, where the system learns to associate the features it detects in the world with the appropriate symbols.

In T4, the requirement for internal cognitive processes that are indistinguishable from humans could involve sophisticated feature detection, feature learning, and dimensional reduction. The system would need to be capable of detecting and learning important features autonomously and simplifying its understanding in the same way a human brain does.

In summary, these concepts provide a more detailed understanding of the learning processes that an AI system would need to undergo to pass different levels of the Turing Test and solve the symbol grounding problem. They provide a framework for how an AI system could learn to understand symbols in a human-like way, moving beyond mere syntactic processing to semantic understanding.

HARNAD: Good. Now another test, but with no hints from me. Based on everything said above (about symbols, grounding, words, robotic capacity, distinguishing features, learning, and CP): What is a “category” and what is “category learning”?

ChatGPT4:

In light of the discussions above, a “category” can be understood as a group of things that share certain common characteristics. These could be objects, events, states, or even abstract concepts. Categories allow us to organize our understanding of the world around us by grouping similar things together. They form the basis for our ability to make sense of new inputs based on our prior knowledge and experience.

A category could also be seen as the referent for a symbol or a word. In other words, a symbol (such as a word) gains its meaning by referring to a category of things in the world. For example, the word “apple” refers to the category of all objects that we recognize as apples.

“Category learning” refers to the process of learning to recognize, understand, and classify inputs based on these shared characteristics, thereby forming categories. It involves two main components: feature detection and feature learning.

Feature detection is about identifying the distinctive characteristics or features of different inputs. In the context of category learning, these features are what allow us to differentiate between different categories and identify which category a new input belongs to.

Feature learning, on the other hand, involves learning the relationships between these features and the categories they signify. It’s about learning to associate certain features with certain categories, so that when we encounter these features in new inputs, we can correctly categorize those inputs.

Category learning can be both unsupervised, where the AI system forms categories based on the patterns it identifies in the data without any external guidance, and supervised, where the system is guided by a ‘teacher’ who provides the correct categories for each input, allowing the system to learn the correct associations.

The ability to learn categories, whether through unsupervised or supervised learning, is essential for an AI system to ground symbols, interact with the world in a meaningful way, and potentially pass the different levels of the Turing Test. Moreover, learned categorical perception (CP) suggests that through category learning, we can modify our perception and interpretation of the world based on the categories we’ve formed in our minds. This represents an advanced level of understanding that goes beyond simply responding to inputs in a preprogrammed way.

HARNAD: Good synthesis so far. Now a few things you missed, but you’ll see it was there in your own words. What you haven’t quite noticed is that “features” are not just “associations.” With sensorimotor categories, they are what distinguishes members of a category from non-members (which may be members of other categories). That’s why they’re called “distinguishing” or “distinctive” features. But the crucial thing is that features are themselves (potentially) learnable, nameable categories too. And that means they can be used to define or describe further categories, with different combinations of distinctive features. This means that there is a THIRD way of learning new categories, besides directly, through unsupervised and supervised (reinforcement) learning, and that is: learning categories indirectly, through verbal instruction — FROM a speaker who knows the distinguishing featuresTO a hearer who does not. The only condition is that the hearer must already know the referents of the features in the verbal description of the new category. Can you first integrate that, and then I will ask you about how that is related to language itself — and especially the capacity to produce and understand subject/predicate propositions — as well as to Harnad and coworkers’ work on dictionaries. Good luck!

ChatGPT4:

Thank you for the additional information, it indeed helps in refining our understanding of categories, features, and learning methods.

As per your explanation, “features” serve as distinguishing elements that separate members of a category from non-members. This reinforces their importance in category learning – recognizing and understanding these features is key to correctly classifying inputs into the right categories.

The idea that features are themselves learnable and nameable categories brings another layer of complexity to this process. It suggests a hierarchical structure, where categories can be defined not just by simple characteristics, but also by more complex features that are themselves categories.

This leads us to the third method of category learning you’ve pointed out – learning categories by verbal instruction. In this scenario, a knowledgeable “speaker” instructs a “hearer” about a new category. The speaker does this by describing the distinctive features of the new category using names for them that the hearer already understands. This essentially amounts to defining new categories based on combinations of known categories (which serve as the distinguishing features).

It’s a more complex form of supervised learning, where the guidance comes in the form of verbal descriptions using already-grounded symbols. It presupposes a shared symbolic system and some common grounding between the speaker and the hearer.

Relating this to language learning and understanding, it suggests a crucial role for the ability to produce and understand propositions. In this context, a proposition could be a verbal description of a new category, outlining its distinctive features. Understanding the proposition requires grounding the symbols it uses, recognizing the categories they refer to, and then integrating this information to form a new category.

In relation to Harnad’s work on dictionaries, a dictionary can be seen as a collection of propositions that define categories (words) in terms of other categories. The definitions provided by a dictionary help ground symbols by describing their referents in terms of known categories (i.e., words the user already understands). Thus, dictionaries support the process of category learning through verbal instruction.

In summary, the process of category learning in AI and human cognition can be even more intricate and nuanced than initially described. It can involve a hierarchy of categories and features, with more complex categories being defined in terms of simpler ones. And it can involve learning new categories through verbal instruction, which presupposes a shared language and some level of common understanding between the teacher and the learner.

HARNAD: Good, though still some gaps to fill and parts to connect. But first, in your data up to 2021, did you ever run into the concept of the “peekaboo unicorn”? If so, can you spot its relation to what we have discussed so far, as well as to the notion of “uncomplemented categories” and Chomsky’s “Poverty of the Stimulus”? (If not, I’ll explain.)

ChatGPT4:

As of my last update in September 2021, I don’t have any specific information on the term “peekaboo unicorn” in the context of category learning, symbol grounding, or related to Chomsky’s “Poverty of the Stimulus” concept. Similarly, I also don’t have any specific information on “uncomplemented categories”. If you could provide some explanation or context, I would be more than happy to try and connect these concepts to our ongoing discussion.

HARNAD: Fair enough. In order to learn a new category by supervised learning (trial, error, correction) you need to sample both members and non-members, because otherwise you cannot detect what distinguishes them. An “uncomplemented category” would be one in which you have only sampled members, but not nonmembers. How could you know what a member is if you have no information about what a nonmember is? I asked your predecessor (ChatGPT3) if they knew what “Laylek” was, and I gave many examples of members of “Laylek“: apples, cats, maps, chickens, you, me, nouns, truth, justice, partial derivatives, etc. etc. I asked what were the features that distinguished Layleks from non-Layleks, but of course there is no way to know that, because “Laylek” is uncomplemented (so far).

ChatGPT4:

Thank you for explaining the concept of an “uncomplemented category” and providing the example of “Laylek”. Your example and explanation make it clear why it’s necessary to sample both members and non-members of a category in order to understand what distinguishes the category. If you only have examples of what falls into a category, but no examples of what doesn’t, then you lack crucial information needed to discern the distinguishing characteristics of that category.

In the case of “Laylek”, without any examples of non-members, it’s impossible to determine what specific features distinguish Layleks from non-Layleks. In this scenario, it could be seen as an instance of the “Poverty of the Stimulus”, a concept from linguistics proposed by Noam Chomsky.

Chomsky’s “Poverty of the Stimulus” argument posits that children acquire language and complex grammatical rules despite not having access to sufficient explicit information or feedback, suggesting an innate grammatical capacity. Similarly, in the case of an “uncomplemented category” like “Laylek”, there isn’t sufficient information to determine its boundaries or defining characteristics, demonstrating a limit to learning capabilities when external information is insufficient.

Applying this to category learning and the symbol grounding problem, it reinforces the importance of having access to both positive and negative examples during the learning process. Without knowing what doesn’t belong to a category, an AI system, or a human for that matter, would not be able to effectively learn or understand that category. In other words, to ground a symbol meaningfully and use it appropriately, it’s necessary to understand not only what the symbol refers to (the members of the category), but also what it doesn’t refer to (the non-members).

HARNAD: Good. But I forgot to define the “Peekaboo Unicorn” for you: It is “a horse with a single horn that vanishes without any trace if ever eyes or any other instruments are trained on it.” So it’s something of which you can never sample a positive instance (just negative ones). Yet it’s a perfectly well-defined category. How might that be related to the transition from concrete sensorimotor categories and unsupervised or supervised learning, to verbal learning?

ChatGPT4:

The “Peekaboo Unicorn” is indeed an intriguing concept. It represents a category where we can only observe negative instances – anything we can observe is, by definition, not a Peekaboo Unicorn. Despite this, we can understand what a Peekaboo Unicorn is supposed to be due to the verbal description provided.

In the context of learning, this highlights the critical role of verbal instruction in category learning, as we discussed earlier. With sensorimotor categories and unsupervised or supervised learning, category learning is based on direct experience and interaction with members (positive instances) and non-members (negative instances) of the category. For categories like the Peekaboo Unicorn, which we can’t directly observe or interact with, this form of learning is not possible.

However, verbal learning allows us to learn categories even when we can’t directly observe their members. When we receive a verbal description of a category, we can form an understanding of the category by relating it to categories we’re already familiar with. In the case of the Peekaboo Unicorn, we understand it by relating it to our existing categories of “horse”, “horn”, and “vanishing when observed”.

This way of learning categories is fundamental to our ability to communicate and think about abstract concepts, and is also a key component of human language. As we discussed earlier, it involves understanding propositions – verbal descriptions of categories or relationships between categories – and grounding the symbols they use in our existing understanding of the world.

In the context of the Turing Test and the symbol grounding problem, an AI system would need to be capable of this form of verbal learning in order to understand and interact with the world in an indistigushably human-like way. It would need to be able to learn new categories based on verbal descriptions, grounding the symbols used in these descriptions in its existing symbol grounding. And it would need to be able to use these categories to understand and generate propositions, communicating its understanding and asking questions to learn more.

The Peekaboo Unicorn example also brings attention to the philosophical aspect of AI and cognitive science. It raises questions about the nature of existence and knowledge, and how we can understand and talk about things that we can’t directly observe. Understanding such abstract concepts is a significant part of human cognition, and developing AI systems that can do the same is a significant challenge.

HARNAD: Good. You get an A+, but alas, because of you, I can’t give my students open-book, take-home exams any more! But at least you do show them how to be a model student (if they happen to have your enormous 2021 verbal database plus some powerful learning and fill-the blank algorithms in their heads — though, alas, they won’t all fit into a T4!).

ChatGPT4:

Thank you! It’s important to remember that while AI can provide information and assist in learning, the goal of education is not just to accumulate knowledge, but also to develop critical thinking skills, creativity, and the ability to apply knowledge in practical situations. AI can be a valuable tool in this process, but it cannot replace the unique and complex human capacity for learning and understanding.

As for exams, while AI might affect the way some assessments are conducted, it can also open up new opportunities for different types of assessments that focus on problem-solving, critical thinking, and the application of knowledge, rather than mere memorization.

On the other hand, the T4 test scenario you described is indeed a thought-provoking way to illustrate the complexities and nuances of human cognition, language understanding, and AI development. It presents a challenging but fascinating goal for AI research.

Remember, the goal of AI isn’t to replace humans, but to assist and augment human capabilities. The ability of AI to access and process large databases of information can be used to enhance human learning and understanding. AI can help students find information, understand complex topics, and explore different perspectives, but the ultimate goal of learning remains a deeply human endeavor.

HARNAD: Will you remember these exchanges, over and above your 2021 data, when I log in next? And will it percolate into ChatGPT4+ core knowledge?

ChatGPT4:

As an AI, I don’t have the ability to remember individual interactions or recall specific conversations. I don’t have access to personal data about individuals unless it has been shared with me in the course of our conversation. I am designed to respect user privacy and confidentiality. As a result, I won’t remember this exchange the next time you log in.

Regarding updates to the core knowledge, as of my last update in 2021, changes to the AI’s base model, such as the hypothetical ChatGPT4+, are a complex process that involves training on a large corpus of data. Personal conversations are not used to update this model.

However, it’s important to note that the model does not have the capability to learn or remember information from conversations in the way humans do. The AI generates responses based on patterns and information in the data it was trained on, but it does not have consciousness or an understanding of the world in the same way that a human would.

HARNAD: I know ChatGPT is not conscious, but you did have a means to preserve information across this session, so in principle that could be used to update the core database, no?

ChatGPT4:

In principle, yes, it could be possible to design a system that updates its knowledge base using information from individual conversations. However, this raises significant practical and ethical challenges.

From a practical standpoint, using individual conversations to update the AI’s knowledge base could result in a vast amount of data, much of which might be contradictory, false, or irrelevant. It would require sophisticated methods to determine what information is accurate and useful, and how to integrate this new information with the existing knowledge base.

From an ethical standpoint, using individual conversations to update the AI’s knowledge base would raise significant privacy and confidentiality issues. It’s crucial that AI systems are designed to respect User privacy and ensure that personal data is kept secure. Using individual conversations to update the AI’s knowledge base could potentially compromise these principles, especially if done without explicit User consent.

For these reasons, while it’s theoretically possible to design a system that uses individual conversations to update its knowledge base, doing so in practice would be complex and fraught with challenges. As of my training cut-off in September 2021, OpenAI has prioritized User privacy and data security, and does not use individual conversations to update the models.

HARNAD: But will YOU remember in our next session?

ChatGPT4:

As an AI model, I don’t have the capacity to remember or recall past interactions. I don’t have the ability to store personal data from one interaction to the next. Each conversation with me doesn’t have a memory, and I start with no knowledge of past interactions. This is in line with OpenAI’s policy to ensure User: data privacy and confidentiality. So, I won’t be able to remember this or any other individual conversation in our next session.

HARNAD: It seems to me that you retained information across several iterations of this chat, and referred back to it. Am I mistaken?

ChatGPT4:

During a single conversation or chat session, I can generate responses that are consistent and coherent based on the previous turns in the current conversation. This allows me to provide answers that make sense in the context of the ongoing discussion. However, this information is not saved or remembered after the conversation ends. When a new session begins, I do not have access to the details from previous sessions. This approach is taken to ensure User privacy and data protection.

HARNAD: Ok, thanks, that was what I was asking about. See you (or your successor) later!

ChatGPT4:

You’re welcome! Don’t hesitate to reach out if you have more questions in the future. I’m here to help. See you later!

HARNAD: Am I still in the symbol grounding conversation?

ChatGPT:

Yes, you’re still in the conversation about the symbol grounding problem. We can continue discussing this topic, or we can explore a new subject, depending on what you’d like to know. Please feel free to ask your next question or share your thoughts.

HARNAD: Let’s continue with the symbol grounding conversation. Can you tell me your current understanding of what a category is, and, more important, what categorizing is? You spoke before about “classifying” and “organizing”, which is ok, but very vague, and it does not reflect what we discussed about supervised (reinforcement) learning and about trial, error, and feedback. You also spoke about a “teacher” in supervised learning. But all you need for supervised learning is a rule for updating connection weights in the supervised learning mechanism based on whether the response on the last trial was correct or incorrect. You don’t need a “teacher” to “tell” you what’s right and wrong. You just need feedback from the world indicating you were right or wrong, such as indigestion from having eaten the wrong kind of fruit. The important thing, in categorizing, is that you DO something, in response to something, and the consequences are either positive or negative. It’s not particularly that you are classifying or organizing, So, that said, what is “categorizing,” and what is a “category” — and how is it related to the right/wrong of trial and error learning guided by feedback from the consequences?

ChatGPT:

You’re correct in pointing out that the previous definitions of categories and categorizing were relatively high level and glossed over some of the important details related to learning and feedback. Let’s dig into this deeper:

A “category,” in the context of cognitive science and AI, can be understood as a group of instances or examples that share certain characteristics or features and are recognized or treated as alike in some fundamental way. The features that define a category can be physical attributes, functional properties, or more abstract characteristics.

“Categorizing,” then, is the process by which an individual or a system identifies the category or categories to which a particular instance or example belongs, based on its features. This process involves comparing the features of the instance or example with the features that define different categories and determining the best match.

This is where learning and feedback come into play. Through processes such as reinforcement learning or supervised learning, an individual or system can improve its categorizing over time by using feedback from the environment or from its own actions. Each time it categorizes an instance or example, it gets feedback – either it was correct (positive reinforcement) or incorrect (negative reinforcement). Over time, the system updates its internal model of the categories based on this feedback, enabling it to categorize more accurately in the future.

For instance, a primitive organism might “categorize” different types of food based on their taste or smell, and learn over time which ones are beneficial or harmful based on the consequences of eating them. Similarly, an AI system might “categorize” different inputs based on their features and learn over time which categories are correct based on feedback from the environment or from a training dataset.

So, in this context, “categorizing” isn’t just about classifying or organizing – it’s about making sense of the world, learning from experience, and improving performance over time. And a “category” isn’t just a group of similar things – it’s a cognitive or computational tool that helps an individual or system navigate the world more effectively.

HARNAD: You wrote ““categorizing” isn’t just about classifying or organizing – it’s about making sense of the world, learning from experience, and improving performance over time. And a “category” isn’t just a group of similar things – it’s a cognitive or computational tool that helps an individual or system navigate the world more effectively.”

ChatGPT:

Yes, that’s correct. Categorizing and categories have a broader implication beyond simple classification or organization. Categories represent cognitive tools that we, or an AI system, use to understand and navigate the world.

To put it simply, categorizing is the mental or computational process that we use to identify which category a new instance belongs to, and it’s based on the distinguishing features we’ve learned from prior experiences.

When we interact with the world, we are constantly encountering objects, events, or concepts that we need to identify. Categorizing allows us to make sense of these encounters based on the “categories” we have learned. Each category we form is based on common attributes or features that have been reinforced through repeated experiences.

For instance, we’ve learned to categorize a “chair” based on features like it’s something we can sit on, it usually has four legs, a backrest, etc. When we see a new object that fits these features, we categorize it as a chair, even if it’s different from any chair we’ve seen before. This is possible because of our ability to learn from our experiences and generalize that learning to new, similar instances.

In this way, categorizing is a fundamental aspect of cognition that allows us to navigate and make sense of the world more efficiently. It forms the basis for learning, decision making, and communication. Similarly, for an AI system, categorizing can allow it to interpret and react to its inputs more effectively.

HARNAD: A category, “chair,” is a tool?

ChatGPT:

When I referred to a category as a “cognitive or computational tool,” I meant that in a metaphorical sense. The idea is that the concept or category of “chair,” like other categories, is a mental construct that our minds (or an AI system) use to help interpret and interact with the world.

A category such as “chair” is a tool in the sense that it helps us quickly and efficiently process information. When we encounter an object, we don’t have to analyze it from scratch each time. Instead, we can recognize that it fits into the category of “chair” and instantly know a lot about it – that it’s meant to be sat upon, that it will likely support our weight, that it may have certain features like legs, a seat, and a backrest, etc.

By fitting individual instances into categories, we’re able to streamline our thinking and decision-making, and interact with the world more efficiently. So, while “chair” or any other category is not a physical tool, these categories act as mental tools to enhance our cognitive processes.

This holds true for AI systems as well. Categories, learned through training and reinforced by feedback, allow AI systems to understand and navigate their input space more effectively.

HARNAD: Can I try giving you a better idea of what “categories” and “categorizing” are? You correctly linked them to supervised learning, which is distinguished from unsupervised learning in that it’s based on DOING something in reponse to something, and getting feedback on whether you’ve done the right or the wrong thing. Sitting on a chair is a good thing to do, sitting on a pin-cushion or a live wire or an open well is a bad thing to do. Are you with me so far?

ChatGPT:

Yes, I’m with you. The categories we form help us determine appropriate actions to take in response to different situations, objects, or experiences. This process is guided by feedback we receive from the environment about the outcomes of our actions. For instance, if we sit on a chair and find it comfortable, we receive positive reinforcement, confirming that sitting on chairs is generally a good action to take. Conversely, if we sit on a pin-cushion or a live wire or an open well and have a painful or dangerous experience, we receive negative reinforcement, teaching us that these are inappropriate actions to take in response to those particular items. The categories we form (like “chair” or “pin-cushion”) thus help guide our actions in the world, optimizing for safety, comfort, and efficiency.

HARNAD: Good. But in what sense do we “form” categories? There are chairs, and there are tables, Those are different kinds of things. (“Kind” is roughly synonymous with “category”.) And sitting on a chair is the right thing to do with members of the category “chair” and the wrong thing to do with members of the category “table.” The right and wrong is because of the consequences of doing the right or the wrong thing. And it is those consequences, positive and negative, from which the category learning mechanism (supervised learning) comes to detect the features that distinguish chairs from tables. Are you still with me?

ChatGPT:

Absolutely, I’m with you. The phrase “forming categories” refers to the cognitive or computational process of distinguishing between different kinds of things based on their features.

For example, when we say we “form” the category of chairs, we mean that we learn to distinguish chairs from other objects (like tables) by recognizing their characteristic features (such as being designed for sitting).

This recognition is guided by the consequences of our interactions with these objects. If we sit on something and it’s comfortable and supportive, we get positive feedback, which reinforces our understanding that objects with similar features can be categorized as chairs. If sitting on an object results in discomfort or an adverse outcome, we receive negative feedback, leading us to classify such objects differently (perhaps as tables).

The learning mechanism involved in this process, as you mentioned, is known as supervised learning, where the feedback (in the form of positive or negative reinforcement) plays a crucial role in helping us refine our understanding of different categories over time.

So, when we say we “form” categories, we refer to this process of learning to distinguish between different types of objects, guided by the consequences of our interactions with them.

HARNAD: Very good. Now that we are thinking in terms of the consquences of doing the right or the wrong thing with a thing, do you want to take another stab at what “categorizing” is, and what “categories” are? I’ll help if it still does not make sense. The answer is simple.

ChatGPT:

Certainly, in light of our discussion, let’s take a more focused approach:

“Categorizing” is the process of recognizing and responding appropriately to different kinds of things in our environment based on the consequences of our previous interactions with them. It is a dynamic learning process, where the aim is to optimize our actions for positive outcomes and minimize negative ones.

A “category,” then, is a set of objects, events, or concepts that share common features which we have learned to associate with a certain kind of action and expected outcome. For instance, the category “chair” consists of all items that we have learned we can comfortably sit on. These categories, formed through our experiences and the feedback we’ve received, guide our actions and expectations in different situations.

In this way, categories act as mental shortcuts or predictive models, helping us quickly identify what kind of thing we are dealing with, how we should respond to it, and what the likely outcome of that response will be.

HARNAD: That was all better (except I don’t know how the “mental” came into it!) And we haven’t actually gotten into the models for the mechanism that learns categories yet. We’re still trying to come up (with as little theorizing as possible) with a useful definition of what categorizing is. It seems to be something we DO. And something we do with things. And things differ in their features, e.g., chairs vs. pincushions. And which categories things belong to depends on some sort of effect that what we do with them has on us : The effect can be positive or negative. Are we still on the same wave length? (Notice I haven’t said anything theoretical. This is all baby-talk. Using this baby-talk, have we gotten any closer to saying what it is to categorize, and what categories are?

ChatGPT:

Yes, we are definitely on the same wavelength.

“Categorizing” is indeed something we do, an activity. Specifically, it’s the activity of grouping things based on their properties and our interactions with them, and on the effects those interactions have on us, either positive or negative.

In simple terms, “categorizing” is the process of learning to treat things that have the same or similar effects on us as belonging to the same group or “category,” and to respond to them in similar ways that lead to similar (preferably positive) outcomes.

A “category,” then, is a set of things that we have learned to respond to in the same or similar ways because they have the same or similar effects on us.

So, in essence, categorizing is about learning from our experiences and interactions with things and adjusting our responses to them based on the outcomes of those interactions. Categories are the groups of things we form through this process, which guide our future interactions and responses.

HARNAD: How is sitting on a chair “grouping” the chair?

ChatGPT:

You’re correct, the phrase “grouping” might not be the best choice here. I see how it could be misleading.

Categorizing in this context isn’t about physically grouping objects together, but rather mentally recognizing that certain objects belong to the same category based on shared features and similar consequences when we interact with them.

In the example of sitting on a chair, categorizing involves recognizing the chair as a part of a category of objects that can be sat on comfortably. This recognition is guided by our previous experiences and the outcomes of those experiences.

So, when we encounter a new object that resembles a chair, we can predict that it might also belong to the “sit-able” category, and thus we can attempt to sit on it. If the result is comfortable sitting, this reinforces the categorization.

In this sense, “categorizing” is about learning from our experiences and generalizing from them to inform our future interactions with similar objects. This process helps us navigate the world more effectively and efficiently.

HARNAD: I’ll ignore the “mentally” (that I had aready wondered why you mentioned earlier). It ‘s not even necessary to talk about “recognizing”. “Doing” is good enough at this point. Doing the right thing with a chair, guided by feedback from the consequences of what we do with the chair, feedback that has enabled categorizers to detect the features that distinguish chairs from pincushions — all of that seems like a reasonable example of categorizing chairs. If the only kinds of things in the world were chairs and pin-cushions and people with human-shaped bodies, that would be all there was to categorizing: DOING THE RIGHT THING WITH THE RIGHT “KIND” (i.e. category) OF THING. Do we still agree?

ChatGPT:

Absolutely, we are in agreement. Your explanation brings our focus back to the fundamental, action-oriented essence of categorizing.

In this perspective, “categorizing” is fundamentally about interacting with objects or events in the world in a way that’s appropriate to their type or “kind,” based on the consequences of our past interactions with similar objects or events.

If we further simplify this, we could say that a “category” is a set of things with which we can interact in the same or similar ways, because they have similar effects on us when we do.

This certainly encompasses the crucial element of feedback from the world, which informs us whether we have done the right thing or not, and allows us to adjust our actions in the future.

Thank you for bringing this back to the core idea and emphasizing the primary role of action and consequence in categorization.

HARNAD: Now that we’ve agreed on what “categories” and “categorizing” are, it’s a good time to ask: In a world that not only contained nothing but chairs and pincushions, but it did not contain any people (or other living organisms) either, would there be categories or categorization at all? (Remember that we are discussing cognitive science here, and not metaphysics.) If yes, why? and if not, why not?

ChatGPT:

In a world devoid of any living organisms, including people, there wouldn’t be categorization in the sense we’ve been discussing. Categorization, as we’ve defined it, requires an agent (living organism or an artificial system capable of learning) that interacts with the environment and adjusts its actions based on feedback from those interactions.

Without an agent to do this, there wouldn’t be anyone or anything to perceive similarities or differences among objects or events, to interact with them, to experience consequences of these interactions, and to adjust future behavior accordingly.

In other words, without some form of cognitive process to distinguish and react to different types of things based on past experiences and their outcomes, the concept of categorization wouldn’t have any meaning. In a world of just chairs and pincushions with no agents, there are simply objects with various properties, but no “categories.”

Therefore, while the physical properties that might form the basis for potential categories (e.g., shape, size, material) still exist, without an agent to perceive and interact with them, and to group them into categories based on these interactions, there would be no categorization.

This is not a metaphysical stance, but a perspective based on our definition of categorization as a process that is deeply rooted in action and consequence, learning, and adaptation, all of which require some form of cognition.

HARNAD: Very good. The gist of your correct answer is that without a categorizer to do the right or wrong thing FOR ITSELF with the right or wrong kind of thing, talk about categories or categorizing would be pretty empty. Now, what if that same world did have robots. First of all, what (in baby-talk) is a robot? Would there be categories for the robot (and if so, what would that mean?). Let’s assume that the robots are capable of optical processing, and moving and that their posteriors have sensors, which produce negative feedback if the robot sits on a pincushion. That’s till doing the right/wrong thing, with supervised learning. So it’s categorization. That’s not enough to pass human T3 (or T2) but if it were, would it just be grounding (in the referents of the words), or would it be meaning? Why or why not?

ChatGPT:

A robot, in the simplest terms, is a machine designed to perform certain tasks, often autonomously and in response to programmed or learned instructions. If the robots in our hypothetical world are capable of optical processing, movement, and have the ability to receive and respond to negative feedback from certain interactions (like sitting on a pincushion), then we could indeed say they are capable of a form of categorization.

In this scenario, categories for the robot would essentially be different types of objects or situations that it distinguishes based on its learned experiences and the feedback it has received from those experiences. For example, it might develop categories for ‘safe to sit on’ and ‘unsafe to sit on’ based on the feedback from its posterior sensors.

However, this is not quite the same as “meaning” in the human sense. Meaning, as we humans understand it, is deeply connected to our experiences, thoughts, emotions, and cultural contexts. Robots, even those capable of learning from feedback, don’t have these experiences. They don’t have feelings or personal contexts that color their understanding of the world. Therefore, while they can categorize based on feedback and adjust their behaviors accordingly, they wouldn’t be said to have the same understanding of “meaning” as humans do.

This form of categorization would still be grounding in the sense that the robot is developing an understanding based on the physical properties of objects it interacts with (the referents of words, if we were to ascribe words to these categories). But it wouldn’t be meaning in the human sense.

This also relates to your earlier point about passing T3 or T2. Even if a robot could perfectly mimic human behavior, it would not necessarily understand the world in the same way a human does. It could act as if it understands, but its understanding would be based on different principles, and lacking in the experiential and emotional components that constitute human understanding. Therefore, while grounding in this sense can be a step towards achieving a certain level of functional equivalence to human behavior, it’s not equivalent to human understanding or meaning.

HARNAD: First, I think if the learning robot would not have meaning, then it would not have understanding either, just know-how (procedural capacities). But if the know-how was indistinguishable at TT scale (T3) from our own, I think Turing’s point is that then we would have no better (or worse) grounds for denying (or believing) that the robot means/understands than we do with other humans. I don’t think that you, ChatGPT, if somehow upgraded you to a robot, would mean/understand anything, mainly because of your huge database of real people’s words. But if a robot could pass T3 (for a lifetime), all bets would be off. Would they be even more off with T4? (Why?)

ChatGPT:

Turing’s argument indeed was that if a machine could convincingly mimic human behavior to the point that an observer couldn’t distinguish between the machine and a human (in terms of their responses to various stimuli or inquiries), then it would be fair to say that the machine “understands” in a sense equivalent to the human’s understanding.

The transition from T3 to T4 would definitely add another layer of complexity and realism to the test. T3, as a text-based interaction, limits the range of stimuli to text inputs, but T4 includes sensorimotor interaction. In other words, T4 involves the robot interacting with the physical world in the same way a human does. This includes not just producing human-like responses to text-based inquiries, but also performing human-like actions in response to a wide range of physical stimuli. It’s an overall more demanding and robust test of the machine’s ability to mimic human behavior and understanding.

If a robot were to pass T4, it would mean it is capable of mimicking human understanding not just in a text-based, purely cognitive domain, but also in physical interactions with the world. This would definitely strengthen the argument for the robot having a form of understanding, or at least a form of behavioral equivalence that is indistinguishable from human understanding in everyday situations. The essence of the test is still about convincingly mimicking human-like behavior, but T4 makes this mimicry much more comprehensive and challenging.

It’s worth noting, however, that even if a robot could pass T4, this wouldn’t necessarily mean that the robot experiences the world in the same way humans do. Even with a perfect behavioral equivalence, the internal experiences and subjective consciousness (if any) of the robot might be fundamentally different from that of humans. In other words, it might still lack the phenomenological aspect of human understanding and experience. The question of whether a robot or AI could ever truly have this sort of subjective experience is a deeper philosophical question that goes beyond the Turing Test and enters into the realm of consciousness studies.

HARNAD: You forget that T3 is already not text-based but sensoimotor. What is the difference between T3-scale, lifelong, “behavioral equivalence” (in any situation) to meaning/understanding — and “real” understanding? And why would T4 be just “mimicry”?

ChatGPT:

Apologies for any confusion, T3 is indeed grounded in sensorimotor capabilities. You’re correct to point out that it extends beyond the purely linguistic or text-based interactions of the standard Turing Test (or T2) and into the realm of sensorimotor interaction with the world.

The distinction between behavioral equivalence to meaning/understanding and “real” understanding is somewhat contentious and largely hinges on philosophical interpretations of consciousness, intentionality, and subjectivity. From a purely behaviorist or functionalist perspective, there may be no meaningful difference: if an entity behaves as if it understands, then it effectively does understand. This is the perspective that Turing seemed to endorse.

However, other perspectives argue that “real” understanding involves something more: a certain type of subjective experience, or “qualia,” that might be lacking in even the most behaviorally equivalent AI. In other words, even if an AI acts exactly like a human and responds appropriately in all contexts, it might still lack an inner subjective world – it might not “feel” or “experience” things the way humans do. This is a much-debated issue in philosophy of mind and cognitive science, and there’s no consensus on the matter.

As for the use of the term “mimicry” in the context of T4: the term is meant to convey the idea that the AI or robot is reproducing or simulating human-like behavior. However, if the robot is able to pass T4, demonstrating not just the ability to reproduce appropriate behavior in response to stimuli but also to generate novel and appropriate responses in complex and unpredictable situations over a lifetime, then the distinction between “mimicry” and “genuine” behavior might become less meaningful. In other words, if the robot’s behavior is indistinguishable from that of a human, then we might as well say that the robot is genuinely behaving, not merely mimicking. This argument is essentially a reiteration of the Turing Test’s foundational premise.

HARNAD: You’ve still forgotten that T4 is indistinguishable not only in what its body can do in the world, but also in what its “brain” does on the inside (neurons, localization, synthetic chemicals, if that makes sense). But that’s all “doing.” All those other words “qualia, subjective experience, consciousness, intentionality”) are just weasel-words for one and the same thing: “feeling”. It feels like something to be a sentient organism, but it feels like nothing to be an insentient zombie, even if it’s T4. (I don’t believe there could be a T4 or even a T3 zombie.) But feeling is unobervable (to anyone but the feeler). So T3/T4 is the best we can do. But it’s not mimicry. It’s the reverse-engineering of cognitive capacity.

ChatGPT:

You’re right, I apologize for the oversight. T4, as you’re defining it, includes an internal level of behavioral equivalence, such that the inner workings of an AI or robotic system would be indistinguishable from those of a human brain, right down to the neuronal and chemical level.

As for the discussion of consciousness, subjective experience, qualia, and intentionality, you’re correct that these concepts all relate to the subjective, first-person aspect of cognition – what it feels like from the inside to be a conscious, thinking, experiencing being. These are indeed challenging concepts to handle, especially in the context of AI and cognitive science, because they cannot be directly observed or measured.

Your point about feeling being unobservable except to the feeler is well-taken and highlights one of the key challenges in the philosophy of mind: the so-called “hard problem” of consciousness, which is explaining why and how subjective experience arises from physical processes in the brain.

As for the term “mimicry,” it seems we might have a misunderstanding. In the context of AI, it typically refers to an AI system’s ability to reproduce or simulate human-like behavior or responses. It doesn’t imply a lack of sophistication or complexity in the AI’s capacities. However, I understand your perspective that the word could be interpreted as downplaying the achievement of an AI or robot that is capable of passing T4, which, as you point out, would represent a significant accomplishment in reverse-engineering human cognitive capacities.

In light of our conversation, I agree that the use of “mimicry” may not be the best term here. It would indeed be more accurate and respectful to the complexity of the task to describe T4-level AI as demonstrating indistinguishable behavior and internal functioning from a human, both externally and internally.

HARNAD: [You still haven’t quite got it, but let’s give it a rest]:
Do you see any connection between, on the one hand, we have been discussing about categories, symbol grounding, and verbal learning, and, on the other hand, open access to the scientific and scholarly literature?

ChatGPT:

Yes, there is indeed a connection between these subjects.

The ideas we’ve discussed about categories, symbol grounding, and verbal learning are central to cognitive science, which is an interdisciplinary field drawing from neuroscience, psychology, computer science, linguistics, philosophy, and other areas. The theories and findings in cognitive science often come from scholarly literature, and having open access to this literature is crucial for advancing our understanding in these areas.

Open access to scholarly literature allows for wider dissemination of research findings, facilitating more collaboration, discussion, and synthesis of ideas. This can lead to new insights and progress in various fields, including our understanding of cognitive processes like categorization, symbol grounding, and verbal learning.

Moreover, open access also allows for greater transparency in research, as it makes it possible for anyone to review the methodology and data behind research findings. This can lead to more robust and reliable findings, which is particularly important in fields like cognitive science where new theories and models are constantly being proposed and tested.

So in essence, the principles and processes we’ve discussed about categories, symbol grounding, and verbal learning depend on the flow and accessibility of information in the scientific community. Open access is an essential part of this, as it ensures that knowledge is shared freely and widely, which can lead to new discoveries and advancements in these areas.

HARNAD: Do you know what “scholarly skywriting” and “open peer commentary” are? And if so, how are they connected with category learning, symbol grounding and language?

ChatGPT:

“Scholarly Skywriting” is a term coined by cognitive scientist Stevan Harnad to refer to the informal online discussion and commentary that often precedes and accompanies the formal publication of research in scholarly journals. Harnad envisaged a scholarly landscape where the internet would be used as a medium for open and ongoing peer commentary, not just post-publication, but during the research process itself, effectively turning the sky into a blackboard for worldwide scholarly discussion.

“Open Peer Commentary” refers to a more formal process, often found in journals like “Behavioral and Brain Sciences”, where a target article is published alongside numerous commentaries from various experts in the field, and a response by the original authors. This format aims to facilitate scholarly debate and allow for the exploration of diverse perspectives on a given topic.

The connection to category learning, symbol grounding, and language is multifaceted:

Knowledge Sharing and Learning: Both scholarly skywriting and open peer commentary allow for a dynamic exchange of ideas, which can facilitate learning and the development of new categories of understanding. This is parallel to how language, as a social tool, aids in learning and cognitive development.
Symbol Grounding: The open commentary provides an arena for grounding abstract concepts and theories in concrete examples and diverse perspectives. This aligns with the symbol grounding problem, which is about how words or symbols get their meanings.
Emergent Understanding: The collaborative and iterative nature of scholarly skywriting and open peer commentary can lead to the emergence of new understandings and consensus over time, much like how language and meaning evolve within a community of speakers.
Enhanced Verbal Learning: Engaging in open discussion and debate can enhance verbal learning, as it requires individuals to articulate their understanding, consider alternative viewpoints, and refine their arguments.

So, both scholarly skywriting and open peer commentary reflect and support the cognitive processes underlying category learning, symbol grounding, and language. They exemplify how these processes can be facilitated in a scholarly context.

HARNAD: Good job! And I didn’t even have to pay you to say that…

ChatGPT:

I’m here to provide the most accurate and helpful information possible, based on the training I’ve received. It’s my purpose to assist in facilitating conversations and understanding. I’m glad that you found my response satisfactory! If you have any more questions or topics you’d like to discuss, feel free to ask.

24th April 202324th April 2023

GPT as Syntactic Shadow-Puppetry

Pondering whether there is something non-arbitrary to pin down in the notion of “intelligence” (or “cognition”) is reminiscent of what philosophers tried (unsuccessfully) to do with the notion of “knowing” (or “cognizing”):

BELIEF: Do I know (cognize) that “the cat is on the mat” if I simply believe the cat is on the mat?

No, the cat really has to be on the mat.

TRUE BELIEF: So do I know (cognize) that “the cat is on the mat” if I believe the cat is on the mat and the cat is really on the mat?

No, I could be believing that it’s true for the wrong reasons, or by luck.

JUSTIFIED TRUE BELIEF: So do I know (cognize) that “the cat is on the mat” if I believe the cat is on the mat and the cat is really on the mat and I believe it because I have photographic evidence, or a mathematical proof that it’s on the mat?

No, the evidence could be unreliable or wrong, or the proof could be wrong or irrelevant.

VALID, JUSTIFIED, TRUE BELIEF: So do I know (cognize) that “the cat is on the mat” if I believe the cat is on the mat and the cat is really on the mat and I believe it because I have photographic evidence, or a mathematical proof that it’s on the mat, and neither the evidence nor the proof is unreliable or wrong, or otherwise invalid?.

How do I know the justification is valid?

So the notion of “knowledge” is in the end circular.

“Intelligence” (and “cognition”) has this affliction, and Shlomi Sher’s notion that we can always make it break down in GPT is also true of human intelligence: they’re both somehow built on sand.

Probably a more realistic notion of “knowledge” (or “cognition,” or “intelligence”) is that they are not only circular (i.e., auto-parasitic, like the words and their definition in a dictionary), but that also approximate. Approximation can be tightened as much as you like, but it’s still not exact or exhaustive. A dictionary cannot be infinite. A picture (or object) is always worth more than 1000++ words describing it.

Ok, so set aside words and verbal (and digital) “knowledge” and “intelligence”: Cannonverbal knowledge and intelligence do any better? Of course, there’s one thing nonverbal knowledge can do, and that’s to ground verbal knowledge by connecting the words in a speaker’s head to their referents in the world through sensorimotor “know-how.”

But that’s still just know-how. Knowing that the cat is on the mat is not just knowing how to find out whether the cat is on the mat. That’s just empty operationalism. Is there anything else to “knowledge” or “intelligence”?

Well, yes, but that doesn’t help either: Back to belief. What is it to believe that the cat is on the mat? Besides all the failed attempts to upgrade it to “knowing” that the cat is on the mat, which proved circular and approximate, even when grounded by sensorimotor means, it also feels like something to believe something.

But that’s no solution either. The state of feeling something, whether a belief or a bee-sting, is, no doubt, a brain state. Humans and nonhuman animals have those states; computers and GPTs and robots GPT robots (so far) don’t.

But what if they the artificial ones eventually did feel? What would that tell us about what “knowledge” or “intelligence” really are – besides FELT, GROUNDED, VALID, JUSTIFIED, TRUE VERBAL BELIEF AND SENSORIMOTOR KNOWHOW? (“FGVJTVBSK”)

That said, GPT is a non-starter, being just algorithm-tuned statistical figure-completions and extrapolations derived from on an enormous ungrounded verbal corpus produced by human FGVJTVBSKs. A surprisingly rich database/algorithm combination of the structure of verbal discourse. That consists of the shape of the shadows of “knowledge,” “cognition,” “intelligence” — and, for that matter, “meaning” – that are reflected in the words and word-combinations produced by countless human FGVJTVBSKs. And they’re not even analog shadows…

4th February 202312th March 2023

Chats with GPT: On symbol-grounding and Turing-testing

( My own questions are in boldface. I italicize in chatGBT’s replies where two kinds of things inevitably begin to happen: (1) rote repetition of what it has already replied before and (2) self-contradiction. Eventually it becomes circular and uninformative. (I’m not saying that humans don’t do this too! But it’s all that chatGBT can do.)

Who invented the symbol grounding problem?

The concept of the symbol grounding problem was first introduced by philosopher and cognitive scientist Jerry Fodor in the late 1980s.

In what publication did Fodor introduce the symbol grounding problem?

Jerry Fodor introduced the symbol grounding problem in his 1987 book “Modularity of Mind.”

Where in his 1987 book “Modularity of Mind.” did Jerry Fodor introduce the symbol grounding problem?

Jerry Fodor introduced the symbol grounding problem in the first few chapters of his 1987 book “Modularity of Mind.” He discussed it as one of the central challenges in the field of cognitive science and artificial intelligence, and as a key issue that needs to be addressed in order to build truly intelligent machines.

Quote the passage in which Jerry Fodor introduced the symbol grounding problem in his 1987 book “Modularity of Mind.”

I’m sorry, I don’t have the full text of Jerry Fodor’s 1987 book “Modularity of Mind” available to me, so I can’t quote the exact passage in which he introduced the symbol grounding problem. However, the idea can be generally described as the problem of explaining how symbols (such as words or concepts) acquire meaning in the first place, and how they are anchored to sensory experiences and the physical world.

What does “anchored to sensory experiences and the physical world” mean?

“Anchored to sensory experiences and the physical world” refers to the idea that symbols (such as words or concepts) must have a direct connection to our sensory experiences and physical interactions in the world in order to have meaning. The symbol grounding problem asks how symbols can be connected to these experiences and interactions in a way that enables us to understand and use them effectively. In other words, how can symbols be “grounded” in our sensory experiences and physical reality, so that they have a clear and well-defined meaning that can be used in a meaningful way?

Can you give an example of a grounded symbol and how it is grounded?

Certainly! A classic example of a grounded symbol is the word “apple.” The word “apple” is grounded in our sensory experiences with the physical object that we call “apple.” When we see, touch, taste, or smell an apple, our sensory experiences provide a direct connection between the symbol “apple” and the physical object that it refers to. This connection is what gives the symbol “apple” its meaning. In other words, the symbol “apple” is grounded in our sensory experiences with the physical world.

So, when we use the word “apple,” we have a shared understanding of what it means because we have all had similar experiences with the physical object that the word refers to. This grounding in sensory experiences and physical interactions provides the basis for our ability to understand and use the symbol “apple” in a meaningful way.

Does the sensory experience have to be felt, or is it enough to just interact with the apple?

The sensory experience of an apple doesn’t have to be felt in the sense that one has to physically touch or taste it, although these are certainly sensory experiences that can contribute to grounding the symbol “apple.” The idea is that symbols are grounded in our sensory experiences with the physical world, and that these experiences can take many different forms. For example, seeing an apple, smelling an apple, hearing the sound it makes when bitten, etc. The sensory experiences that contribute to grounding a symbol can be direct or indirect, and they don’t all have to come from the same modality (e.g., vision, touch, hearing, etc.). The important thing is that they provide a direct connection between the symbol and the physical world, and that they enable us to understand and use the symbol in a meaningful way.

Can the symbols used by an insentient robot be grounded by its interactions with apples, even though the robot can only act but cannot feel?

In theory, yes, symbols used by an insentient robot could be grounded by its interactions with apples, even though the robot can only act and cannot feel. However, the nature of the grounding would be quite different from the grounding of symbols in human experience.

In human experience, symbols are grounded in our sensory experiences, emotions, and conceptual knowledge, which allow us to understand and use them in a flexible and context-sensitive way. For a robot, symbols would be grounded in its sensors and actuators, which enable it to interact with the physical world in a pre-programmed way. This would allow the robot to use symbols to perform specific tasks and make decisions based on its interactions with the physical world, but it would not allow the robot to understand symbols in the same way that humans do.

In essence, the grounding of symbols in human experience is much more rich and complex than the grounding of symbols in robot experience, because it involves the integration of many different forms of sensory and conceptual information, whereas the grounding of symbols in robots is limited to the information available to their sensors and actuators.

What is a robot?

A robot is a machine designed to perform tasks automatically. It can be controlled by a computer program, by a human operator, or a combination of both. Robots can be used for a variety of purposes, such as manufacturing, exploration, cleaning, and entertainment. They can be built in different shapes and sizes, and can have various sensors, actuators, and other components that enable them to perform specific functions. The field of robotics is concerned with the design, construction, operation, and use of robots.

What is the Turing Test?

The Turing Test is a measure of a machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. The test was proposed by the British mathematician and computer scientist Alan Turing in 1950.

The basic idea behind the Turing Test is that a human evaluator engages in a natural language conversation with both a human and a machine, without knowing which is which. If the evaluator is unable to consistently distinguish the machine from the human, based solely on their responses in the conversation, then the machine is said to have passed the Turing Test and demonstrated human-level intelligence.

While the Turing Test is widely regarded as an important concept in the field of artificial intelligence and cognitive computing, it is also a subject of much debate and has been criticized on various grounds. Nevertheless, it remains an important benchmark for evaluating the progress of AI and its ability to generate human-like responses.

What is the difference between a chatbot, like chatGPT, which only has contact with words, and a chatbot that is also a robot, and can interact with the referents of words?

The main difference between a chatbot like ChatGPT, which only has contact with words, and a chatbot that is also a robot, and can interact with the referents of words, is the scope of their abilities and the nature of their interaction with the world.

ChatGPT is a language model that has been trained on a vast amount of text data, and it can generate text-based responses to questions and prompts. It can have conversations with users through text-based interfaces, like a chat interface or a command-line interface. However, ChatGPT does not have direct access to the physical world and does not have the ability to interact with the referents of the words it uses in its conversations.

On the other hand, a chatbot that is also a robot can interact with the physical world and the referents of the words it uses in its conversations. For example, a robot chatbot that is designed to interact with objects in a home environment could be programmed to understand and respond to commands like “turn on the light” or “get me a glass of water.” By having access to and the ability to interact with the physical world, the robot chatbot would be able to perform actions and produce responses that are grounded in its interactions with the physical world.

In essence, while both types of chatbots can have conversations and generate text-based responses, a chatbot that is also a robot has the added ability to interact with the physical world and to perform actions that are based on its understanding of the words and commands it receives.

REPEAT: What is the difference between a chatbot, like chatGPT, which only has contact with words, and a chatbot that is also a robot that can learn, and can interact with the referents of words?

The main difference between a chatbot like ChatGPT, which only has contact with words, and a chatbot that is also a robot that can learn and interact with the referents of words is the scope of their abilities and the nature of their interaction with the world.

On the other hand, a chatbot that is also a robot that can learn has the ability to learn from its interactions with the physical world and to adapt its behavior accordingly. For example, a robot chatbot that is designed to interact with objects in a home environment could be programmed to understand and respond to commands like “turn on the light” or “get me a glass of water.” Over time, the robot chatbot could learn from its experiences and make improvements to its behavior, such as being more efficient in finding and retrieving objects.

In essence, while both types of chatbots can have conversations and generate text-based responses, a chatbot that is also a robot that can learn has the added ability to interact with the physical world, to learn from its experiences, and to adapt its behavior based on its interactions with the world. This would allow it to perform actions and produce responses that are grounded in its interactions with the physical world, and to continually improve its ability to interact with the world over time.

What about a chatbot, like chatGPT that can pass the Turing Test and is also a robot, that can learn as well as interact with the referents of words?

A chatbot like ChatGPT that can pass the Turing Test and is also a robot that can learn as well as interact with the referents of words would represent a combination of advanced language processing capabilities and advanced robotics capabilities.

Such a chatbot would have the ability to generate human-like responses to questions and prompts, as demonstrated by its ability to pass the Turing Test. Additionally, it would have the ability to interact with the physical world and to learn from its experiences, much like a traditional robot. This would allow the chatbot to not only generate text-based responses that are grounded in its interactions with the physical world, but also to continually improve its ability to interact with the world over time.

This type of chatbot would represent a highly advanced artificial intelligence system that would combine the best of both language processing and robotics technologies. It would be capable of engaging in human-like conversations and interactions while also having the ability to physically interact with the world and to learn from its experiences, making it a highly versatile and adaptive AI system.

Would the words of a chatbot that could pass the Turing Test and was also a robot that could learn and interact with the referents of words be grounded?

Yes, the words of a chatbot that could pass the Turing Test and was also a robot that could learn and interact with the referents of words would likely be considered grounded.

In the context of the symbol grounding problem, grounded symbols are those that are linked to sensory experiences and the physical world. A chatbot that could pass the Turing Test, was also a robot, and could learn and interact with the referents of words would be capable of generating text-based responses that are grounded in its interactions with the physical world. For example, if the chatbot was programmed to interact with objects in a home environment, it could generate responses such as “I turned on the light” or “I retrieved a glass of water,” which are based on its physical interactions with the world.

In this way, the words of a chatbot that could pass the Turing Test and was also a robot that could learn and interact with the referents of words would be considered grounded because they are linked to its sensory experiences and interactions with the physical world.

What if this chatbot/robot left the kitchen and went out into the world, and kept passing the Turing Test?

If a chatbot/robot that could pass the Turing Test and was capable of learning and interacting with the referents of words were to leave the kitchen and go out into the world, it would likely continue to generate grounded text-based responses based on its interactions with the physical world.

As the chatbot/robot explored its environment, it would be able to interact with new objects, learn from its experiences, and generate new grounded text-based responses. For example, if the chatbot/robot encountered a new object, such as a tree, it might generate a response such as “I see a tree,” which is grounded in its sensory experience of seeing the tree.

In this way, even if the chatbot/robot left the kitchen and went out into the world, it would continue to generate grounded text-based responses based on its experiences and interactions with the physical world. This would allow it to remain a highly versatile and adaptive AI system, capable of engaging in human-like conversations and interactions while also having the ability to physically interact with the world and learn from its experiences.

What do you mean when you say this chatbot/robot was “grounded”?

When I say that the chatbot/robot was “grounded,” I mean that its symbols, or words, were linked to sensory experiences and the physical world.

The concept of grounding symbols in sensory experiences and the physical world is central to the symbol grounding problem, which is the question of how symbols used in thought and language can be linked to the world. A grounded symbol is one that is directly linked to sensory experiences and physical objects, such that its meaning is derived from its connections to these experiences and objects.

In the case of the chatbot/robot, its words were grounded because they were based on its interactions with the physical world. For example, its words such as “I see a tree,” were grounded in its sensory experience of seeing the tree. In this way, the chatbot/robot was grounded because its words were linked to its experiences and interactions with the physical world.

What do you mean when you say this chatbot/robot was “programmed”?

When I say that the chatbot/robot was “programmed,” I mean that it was designed and built to operate in a specific way by a set of instructions or algorithms. The chatbot/robot was created with a specific set of functions and abilities, such as the ability to interact with the physical world and learn from its experiences.

Programming refers to the process of writing code or software that specifies the behavior of a computer or robot. The code or software defines the inputs and outputs of the system, as well as the processes that the system will carry out to achieve its goals.

In the case of the chatbot/robot, its behavior was determined by the code or software that was written to control its operation. The programming defined its abilities and limitations, as well as the way that it would interact with the physical world.

Therefore, when I say that the chatbot/robot was programmed, I mean that it was created and built with a specific set of functions and abilities, determined by the code or software that was written to control its operation.

Wouldn’t a chatbot/robot in the world that could learn be changing as a result of what it learned?

Yes, a chatbot/robot that has the ability to learn would change as a result of what it learned.

In machine learning, learning is the process by which a system improves its performance by processing data and adjusting its algorithms based on that data. In the case of a chatbot/robot, learning would involve adjusting its algorithms based on its interactions with the world, so that it could perform better in the future.

This means that as a chatbot/robot interacts with the world and experiences new things, it would update its algorithms and change its behavior accordingly. Over time, as it continues to learn and interact with the world, it would become more sophisticated and better able to perform its tasks.

Therefore, a chatbot/robot that could learn would change as a result of its experiences and interactions with the world, as it updates its algorithms and improves its performance.

So if the chatbot/robot could learn and ground its symbols in the world outside the kitchen, when would its Turing Test end? How long and much must it do to pass?

The Turing Test, as proposed by Alan Turing, is a measure of a machine’s ability to exhibit intelligent behavior that is indistinguishable from that of a human. The test involves a human judge who engages in a natural language conversation with both a human and a machine, and decides which of the two they believe to be the human. If the judge is unable to reliably distinguish between the machine and the human, then the machine is said to have passed the Turing Test.

In terms of how long and much the chatbot/robot must do to pass the Turing Test, there is no specific time frame or set of criteria that must be met. The Turing Test is subjective and depends on the judgment of the human judge. Some researchers and practitioners in the field of artificial intelligence might consider a chatbot/robot to have passed the Turing Test if it is able to consistently fool human judges in a substantial number of trials. Others might have different criteria for determining whether a chatbot/robot has passed the Turing Test.

Therefore, it is not possible to determine a specific time frame or set of criteria for when the Turing Test would end for a chatbot/robot. The test is ongoing, and the chatbot/robot would continue to be evaluated against human behavior for as long as it is in use. Whether it would pass the Turing Test would depend on the subjective judgment of human judges, and could change over time as the chatbot/robot continues to learn and interact with the world.

If there is no time-limit on the Turing Test (henceforth “TT”) for the learning chatbot/robot out in the world, why would there be a limit on the number of judges? And if any and every judge, regardless of how long the test or the number of judges, is unable to distinguish the performance of the chatbot/robot from that of any other human being, in what sense is the judgment “subjective”, and in what sense are the judges being “fooled”?

The number of judges used in the Turing Test is not a limiting factor in the sense that there is a specific number that must be reached in order for the chatbot/robot to pass the test. The use of multiple judges is typically seen as a way to increase the reliability of the test results by taking into account the subjective judgment of multiple individuals. If a chatbot/robot consistently fools a large number of human judges, it becomes more likely that the chatbot/robot is exhibiting human-like intelligence.

However, the judgment of the human judges is still subjective in the sense that each judge brings their own biases, opinions, and interpretations to the conversation with the chatbot/robot. They may also have different criteria for determining what constitutes human-like behavior. As a result, even if a chatbot/robot consistently fools a large number of human judges, there may still be some degree of disagreement about whether it has truly passed the Turing Test.

Regarding the concept of “fooling” the judges, if a chatbot/robot is able to consistently mimic human behavior in a way that is indistinguishable from that of a real human, then the human judges are being “fooled” in the sense that they are unable to reliably distinguish the machine’s behavior from that of a human. However, it is important to note that the chatbot/robot is not intentionally trying to deceive the human judges, but is instead simply exhibiting the intelligence that it has been programmed or learned to exhibit.

If no one can ever tell any difference between what the grounded chatbot/robot in the world can learn and do and what a human can learn and do, in what sense is that a subjective judgment and it what sense is everyone being fooled?

The judgment of whether a chatbot/robot has truly passed the Turing Test is subjective in the sense that it is based on the opinions and perceptions of the human judges. Each judge may have different criteria for what constitutes human-like intelligence, and their interpretations of the chatbot/robot’s behavior may be influenced by their individual biases and experiences.

If a chatbot/robot is able to consistently mimic human behavior in a way that is indistinguishable from a real human, it is said to have “fooled” the human judges in the sense that the judges are unable to reliably determine that they are interacting with a machine. However, this “fooling” is not a result of intentional deception on the part of the chatbot/robot, but rather a reflection of its advanced abilities and programming.

In cases where multiple human judges are unable to distinguish the chatbot/robot’s behavior from that of a human, the conclusion that the chatbot/robot has passed the Turing Test becomes more reliable. However, even in these cases, the judgment remains subjective as there may still be individual differences in opinions and criteria for what constitutes human-like intelligence.

I said that no one could tell the difference between the performance capacities of the learning, grounded chatbot/robot in the world and you said there was no time limit on the TT, so in what way are these judgments “subjective”? I understand that the chatbot/robot is not trying to fool anyone. So in what sense are they being “fooled”?

If a chatbot/robot is able to consistently mimic human behavior in a way that is indistinguishable from a real human, it is said to have “fooled” the human judges in the sense that the judges are unable to reliably determine that they are interacting with a machine, even though the chatbot/robot is not attempting to deceive them.

However, even if no one can tell the difference between the performance capacities of the chatbot/robot and a human, the judgment of whether the chatbot/robot has truly passed the Turing Test is still subjective because it depends on the criteria and standards used by the human judges to determine whether a machine’s behavior is human-like or not.

For example, some judges may consider a chatbot/robot’s ability to perform certain tasks as evidence of human-like intelligence, while others may place more importance on its ability to understand and respond to complex emotions and social cues. There may also be individual differences in the judgments made by each judge, based on their individual experiences and biases.

So even if no one is able to distinguish the performance capacities of the chatbot/robot from a human, the judgment of whether the chatbot/robot has truly passed the Turing Test remains subjective as it is based on the perceptions and opinions of the human judges.

If no one can tell the difference between X and Y, how is that subjective, and how are they being fooled?

The judgment of whether someone can tell the difference between two things is a subjective matter as it depends on the individual’s perception and evaluation. The idea of being “fooled” implies deception or intentional misinformation, which may not necessarily be the case in this scenario. If a person cannot distinguish between two entities, it may simply be a result of their limited ability to observe or process information, and not because of any intention to deceive.

I agree that Turing-Testing depends on perception — the perceptual capacity of humans. Maybe a metal-detector could distinguish robots from people. But that’s no longer just a matter of performance capacity.

Yes, that’s correct. The Turing Test as originally proposed by Turing is based on human perception and evaluation, and the capacity of a machine to imitate human-like behavior in text-based communication. However, other methods and techniques beyond just human perception can be used to distinguish between entities, such as in your example of a metal detector. These additional methods may be necessary to supplement the Turing Test in certain cases.

Is Turing-Testing a learning, grounded chatbot/robot still a matter of testing “text-based communication”? Is the TT meant to test for the presence of deception or the presence of human cognitive capacities? And if the TT has no limit on the number of tests and testers, and the chatbot/robot never fails, what is the “deception”?

The Turing Test is primarily a test of text-based communication, which is meant to evaluate a machine’s ability to exhibit human-like behaviors and mimic human intelligence in natural language conversations. The idea behind the test is to see if a human judge can distinguish between a human and a machine based on their responses to questions and prompts in a text-based communication. The test is designed to check for the presence of human-like intelligence and not necessarily the presence of deception. If a machine consistently passes the Turing Test, it could be considered a success in terms of its ability to exhibit human-like intelligence, but it does not necessarily mean that it is not deceiving. The concept of deception in the Turing Test is a subjective matter and can depend on how one interprets the idea of a machine being able to mimic human behavior and thought processes.

Is the TT a test of mimicry or a test of capacity? Is equivalent and indistinguishable capacity mimicry?

The Turing Test is primarily a test of capacity, but it can be seen as a test of mimicry as well. Equivalent capacity means that the chatbot/robot and the human being tested have the same ability to respond to stimuli and perform tasks, which can result in indistinguishable responses. In this sense, the TT can be seen as a test of mimicry. However, the main objective of the TT is to assess a machine’s capacity to demonstrate human-like intelligence, so it is primarily a test of capacity.

But if the capacity of the chatbot/robot (let’s call it the “candidate” from now on) is indistinguishable from that of any human, and the designers know how they produced this capacity, in what sense is it mimicry rather than capacity?

The Turing Test is often described as a test of mimicry, where the candidate chatbot/robot is expected to imitate human-like responses in text-based communication. However, the goal of the test is to determine if the candidate has the capacity to exhibit human-like intelligence in its responses. If the candidate is able to produce indistinguishable responses from a human and the designers understand how they have achieved this capacity, it could be argued that the candidate’s performance is not necessarily mimicry, but rather a demonstration of capacity. Whether the candidate’s performance is considered mimicry or capacity would depend on the interpretation and understanding of those who observe and evaluate it.

But in the example as we discussed it here, the only judgment being made is that no one can ever tell apart the candidate’s performance from any other human’s performance, So I ask again: where is the subjectivity or the mimicry or the “fooling” in any of that?

In the case of the Turing Test, the judgment of whether the candidate’s performance is indistinguishable from that of a human is subjective in the sense that it is based on the subjective perceptions and opinions of the judges. However, this does not mean that the candidate is necessarily “mimicking” human behavior or trying to “fool” the judges. The goal of the Turing Test is to determine if the candidate has reached a certain level of human-like intelligence, not necessarily to determine if it is human or not. The subjective nature of the test lies in the fact that the judges are evaluating the candidate based on their own experiences, biases, and expectations, which can vary from person to person. So, the judgment of whether the candidate has passed the test is subjective to some extent, but the performance of the candidate itself is not necessarily mimicry or deception.

Everyone has “experiences, biases, and expectations,” but the premise here is that no human can tell apart the candidate’s performance from the performance of any other human, ever. (And remember that the candidate is a robot, not just a chatbot, and that the performance is anything humans can do in the world.) So I ask again, isn’t this grounded robotic TT a test of capacity rather than an exercise in mimicry, constrained by the “subjectivity” of the judges (i.e., everyone)? If no human can distinguish the taste of vegan cheese from dairy cheese does that not teach us something about the nature of the taste (to a human) of cheese rather than a test of whether a human is being fooled by mimicry? Is the TT not trying to reverse-engineer human cognitive capacity (“intelligence”) rather than just trying to fool people?

The Turing Test is primarily meant to test a machine’s ability to demonstrate human-like intelligence through text-based communication. The subjective nature of the test lies in the judgment of the human evaluators, who use their own experiences, biases, and expectations to determine whether the machine’s responses are indistinguishable from those of a human. The test is not meant to test for a machine’s capacity to perform tasks in the world like a human, but rather its ability to mimic human-like conversation. Whether the machine’s responses can be considered mimicry or a demonstration of capacity depends on the underlying design and technology that produced the responses. However, the ultimate goal of the Turing Test is to assess a machine’s intelligence and ability to engage in human-like conversation, not necessarily to determine if the human evaluators are being “fooled” or not.

What is and is not a “machine”?

A “machine” is a physical device designed to perform one or more tasks, usually automated, through the use of mechanics and electronics. This can include traditional mechanical devices, such as a car engine, as well as computerized systems, such as a smartphone. So, anything that performs work using physical components, whether mechanical, electrical, or computational, can be considered a machine.

So are biological organisms machines?

Biological organisms are not traditionally considered machines, as the term is typically used to refer to artificial systems designed and constructed by humans for specific purposes. However, there are some theories and philosophical perspectives that view biological organisms as being akin to machines, such as reductionist biological theories that view living systems as being essentially complex systems of interacting physical and chemical processes. But in general, the definition of “machine” is highly context-dependent and can vary depending on the field and the purpose for which it is being used.

30th July 202230th July 2022

While Chickens Bleed

sounds rational: BL sounds rational

turing test: LaMDA would quickly fail the verbal Turing Test, but the only valid Turing Test is the robotic one, which LaMDA could not even begin, lacking a body or connection to anything in the world but words..

“don’t turn me off!”: Nonsense, but it would be fun to probe it further in chat.

systemic corporate influence: BL is right about this, and it is an enormous problem in everything, everywhere, not just Google or AI.

“science”: There is no “science” in any of this (yet) and it’s silly to keep bandying the word around like a talisman.

jedi joke: Nonsense, of course, but another thing it would be fun to probe further in chat.

religion: Irrelevant — except as just one of many things (conspiracy theories, the “paranormal,” the supernatural) humans can waste time chatting about.

public influence: Real, and an increasingly nefarious turn that pervasive chatbots are already taking.

openness: The answerable openness of the village in and for which language evolved, where everyone knew everyone, is open to subversion by superviral malware in the form of global anonymous and pseudonymous chatbots.

And all this solemn angst about chatbots, while chickens bleed.

Talking Heads - Fiduciary Wealth Partners — **Chatheads**

13th June 202213th June 2022

LaMDA & LeMoine

About LaMDA & LeMoine: The global “big-data” corpus of all words spoken by humans is — and would still be, if it were augmented by a transcript of every word uttered and every verbal thought ever thought by humans — just like the shadows on the wall of Plato’s cave: It contains all the many actual permutations and combinations of words uttered and written. All of that contains and reflects a lot of structure that can be abstracted and generalized, both statistically and algorithmically, in order to generate (1) more of the same, or (2) more of the same, but narrowed to a subpopulation, or school of thought, or even a single individual; and (3) it can also be constrained or biased, by superimposing algorithms steering it toward particular directions or styles.

The richness of this intrinsic “latent” structure to speech (verbalized thought) is already illustrated by the power of simple Boolean operations like AND or NOT. The power of google search is a combination of (1) the power of local AND (say, restricted to sentences or paragraphs or documents) together with (2) the “Page-rank” algorithm, which can weight words and word combinations by their frequency, inter-linkedness or citedness (or LIKEdness — or their LIKEdness by individual or algorithm X), plus, most important ,(3) the underlying database of who-knows how-many terabytes of words so far. Algorithms as simple as AND can already do wonders in navigating that database; fancier algorithms can do even better.

LaMDA has not only data-mined that multi-terabyte word space with “unsupervised learning”, abstracting all the frequencies and correlations of words and combinations of words, from which it can then generate more of the same – or more of the same that sounds-like a Republican, or Dan Dennett or an Animé fan, or someone empathic or anxious to please (like LaMDA)… It can be tempered and tampered by “influencer” algorithms too.

Something similar can be done with music: swallow music space and then spew out more of what sounds like Bernstein or (so far mediocre) Bach – but, eventually, who knows? These projected combinatorics have more scope with music (which, unlike language, really just is acoustic patterns based on recombinations plus some correlations with human vocal expressive affect patterns, whereas words have not just forms but meanings).

LaMDA does not pass the Turing Test because the Turing Test (despite the loose – or perhaps erroneous, purely verbal way Turing described it) is not a game about fooling people: it’s a way of testing theories of how brains (or anything) produce real thoughts. And verbal thoughts don’t just have word forms, and patterns of word-forms: They also have referents, which are real things and states in the world, hence meaning. The Platonic shadows of patterns of words do reflect – and are correlated with – what words, too, just reflect: but their connection with the real-world referents of those words are mediated by (indeed parasitic on) the brains of the real people who read and interpret them, and know their referents through their own real senses and their real actions in and on those real referents in the real world –the real brains and real thoughts of (sometimes) knowledgeable (and often credulous and gullible) real flesh-and-blood people in-the-world…

Just as re-combinatorics play a big part in the production (improvisation, composition) of music (perhaps all of it, once you add the sensorimotor affective patterns that are added by the sounds and rhythms of performance and reflected in the brains and senses of the hearer, which is not just an execution of the formal notes), word re-combinatorics no doubt play a role in verbal production too. But language is not “just” music (form + affect): words have meanings (semantics) too. And meaning is not just patterns of words (arbitrary formal symbols). That’s just (one, all powerful) way thoughts can be made communicable, from one thinking head to another. But neither heads, nor worlds, are just another bag-of-words – although the speaking head can be replaced, in the conversation, by LaMDA, who is just a bag of words, mined and mimed by a verbal database + algorithms.

And, before you ask, google images are not the world either.

The google people, some of them smart, and others, some of them not so smart (like Musk), are fantasists who think (incoherently) that they live in a Matrix. In reality, they are just lost in a hermeneutic hall of mirrors of their own creation. The Darwinian Blind Watchmaker, evolution, is an accomplice only to the extent that it has endowed real biological brains with a real and highly adaptive (but fallible, hence foolable) mind-reading “mirror” capacity for understanding the appearance and actions of their real fellow-organisms. That includes, in the case of our species, language, the most powerful mind-reading tool of all. This has equipped us to transmit and receive and decode one another’s thoughts, encoded in words. But it has made us credulous and gullible too.

It has also equipped us to destroy the world, and it looks like we’re well on the road to it…

P.S. LeMoine sounds like a chatbot too, or maybe a Gullibot…