Chats with GPT: On symbol-grounding and Turing-testing

( My own questions are in boldface. I italicize in chatGBT’s replies where two kinds of things inevitably begin to happen: (1) rote repetition of what it has already replied before and (2) self-contradiction. Eventually it becomes circular and uninformative. (I’m not saying that humans don’t do this too! But it’s all that chatGBT can do.)

Who invented the symbol grounding problem?

The concept of the symbol grounding problem was first introduced by philosopher and cognitive scientist Jerry Fodor in the late 1980s.

In what publication did Fodor introduce the symbol grounding problem?

Jerry Fodor introduced the symbol grounding problem in his 1987 book “Modularity of Mind.”

Where in his 1987 book “Modularity of Mind.” did Jerry Fodor introduce the symbol grounding problem?

Jerry Fodor introduced the symbol grounding problem in the first few chapters of his 1987 book “Modularity of Mind.” He discussed it as one of the central challenges in the field of cognitive science and artificial intelligence, and as a key issue that needs to be addressed in order to build truly intelligent machines.

Quote the passage in which Jerry Fodor introduced the symbol grounding problem in his 1987 book “Modularity of Mind.”

I’m sorry, I don’t have the full text of Jerry Fodor’s 1987 book “Modularity of Mind” available to me, so I can’t quote the exact passage in which he introduced the symbol grounding problem. However, the idea can be generally described as the problem of explaining how symbols (such as words or concepts) acquire meaning in the first place, and how they are anchored to sensory experiences and the physical world.

What does “anchored to sensory experiences and the physical world” mean?

“Anchored to sensory experiences and the physical world” refers to the idea that symbols (such as words or concepts) must have a direct connection to our sensory experiences and physical interactions in the world in order to have meaning. The symbol grounding problem asks how symbols can be connected to these experiences and interactions in a way that enables us to understand and use them effectively. In other words, how can symbols be “grounded” in our sensory experiences and physical reality, so that they have a clear and well-defined meaning that can be used in a meaningful way?

Can you give an example of a grounded symbol and how it is grounded?

Certainly! A classic example of a grounded symbol is the word “apple.” The word “apple” is grounded in our sensory experiences with the physical object that we call “apple.” When we see, touch, taste, or smell an apple, our sensory experiences provide a direct connection between the symbol “apple” and the physical object that it refers to. This connection is what gives the symbol “apple” its meaning. In other words, the symbol “apple” is grounded in our sensory experiences with the physical world.

So, when we use the word “apple,” we have a shared understanding of what it means because we have all had similar experiences with the physical object that the word refers to. This grounding in sensory experiences and physical interactions provides the basis for our ability to understand and use the symbol “apple” in a meaningful way.

Does the sensory experience have to be felt, or is it enough to just interact with the apple?

The sensory experience of an apple doesn’t have to be felt in the sense that one has to physically touch or taste it, although these are certainly sensory experiences that can contribute to grounding the symbol “apple.” The idea is that symbols are grounded in our sensory experiences with the physical world, and that these experiences can take many different forms. For example, seeing an apple, smelling an apple, hearing the sound it makes when bitten, etc. The sensory experiences that contribute to grounding a symbol can be direct or indirect, and they don’t all have to come from the same modality (e.g., vision, touch, hearing, etc.). The important thing is that they provide a direct connection between the symbol and the physical world, and that they enable us to understand and use the symbol in a meaningful way.

Can the symbols used by an insentient robot be grounded by its interactions with apples, even though the robot can only act but cannot feel?

In theory, yes, symbols used by an insentient robot could be grounded by its interactions with apples, even though the robot can only act and cannot feel. However, the nature of the grounding would be quite different from the grounding of symbols in human experience.

In human experience, symbols are grounded in our sensory experiences, emotions, and conceptual knowledge, which allow us to understand and use them in a flexible and context-sensitive way. For a robot, symbols would be grounded in its sensors and actuators, which enable it to interact with the physical world in a pre-programmed way. This would allow the robot to use symbols to perform specific tasks and make decisions based on its interactions with the physical world, but it would not allow the robot to understand symbols in the same way that humans do.

In essence, the grounding of symbols in human experience is much more rich and complex than the grounding of symbols in robot experience, because it involves the integration of many different forms of sensory and conceptual information, whereas the grounding of symbols in robots is limited to the information available to their sensors and actuators.

What is a robot?

A robot is a machine designed to perform tasks automatically. It can be controlled by a computer program, by a human operator, or a combination of both. Robots can be used for a variety of purposes, such as manufacturing, exploration, cleaning, and entertainment. They can be built in different shapes and sizes, and can have various sensors, actuators, and other components that enable them to perform specific functions. The field of robotics is concerned with the design, construction, operation, and use of robots.

What is the Turing Test?

The Turing Test is a measure of a machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. The test was proposed by the British mathematician and computer scientist Alan Turing in 1950.

The basic idea behind the Turing Test is that a human evaluator engages in a natural language conversation with both a human and a machine, without knowing which is which. If the evaluator is unable to consistently distinguish the machine from the human, based solely on their responses in the conversation, then the machine is said to have passed the Turing Test and demonstrated human-level intelligence.

While the Turing Test is widely regarded as an important concept in the field of artificial intelligence and cognitive computing, it is also a subject of much debate and has been criticized on various grounds. Nevertheless, it remains an important benchmark for evaluating the progress of AI and its ability to generate human-like responses.

What is the difference between a chatbot, like chatGPT, which only has contact with words, and a chatbot that is also a robot, and can interact with the referents of words?

The main difference between a chatbot like ChatGPT, which only has contact with words, and a chatbot that is also a robot, and can interact with the referents of words, is the scope of their abilities and the nature of their interaction with the world.

ChatGPT is a language model that has been trained on a vast amount of text data, and it can generate text-based responses to questions and prompts. It can have conversations with users through text-based interfaces, like a chat interface or a command-line interface. However, ChatGPT does not have direct access to the physical world and does not have the ability to interact with the referents of the words it uses in its conversations.

On the other hand, a chatbot that is also a robot can interact with the physical world and the referents of the words it uses in its conversations. For example, a robot chatbot that is designed to interact with objects in a home environment could be programmed to understand and respond to commands like “turn on the light” or “get me a glass of water.” By having access to and the ability to interact with the physical world, the robot chatbot would be able to perform actions and produce responses that are grounded in its interactions with the physical world.

In essence, while both types of chatbots can have conversations and generate text-based responses, a chatbot that is also a robot has the added ability to interact with the physical world and to perform actions that are based on its understanding of the words and commands it receives.

REPEAT: What is the difference between a chatbot, like chatGPT, which only has contact with words, and a chatbot that is also a robot that can learn, and can interact with the referents of words?

The main difference between a chatbot like ChatGPT, which only has contact with words, and a chatbot that is also a robot that can learn and interact with the referents of words is the scope of their abilities and the nature of their interaction with the world.

ChatGPT is a language model that has been trained on a vast amount of text data, and it can generate text-based responses to questions and prompts. It can have conversations with users through text-based interfaces, like a chat interface or a command-line interface. However, ChatGPT does not have direct access to the physical world and does not have the ability to interact with the referents of the words it uses in its conversations.

On the other hand, a chatbot that is also a robot that can learn has the ability to learn from its interactions with the physical world and to adapt its behavior accordingly. For example, a robot chatbot that is designed to interact with objects in a home environment could be programmed to understand and respond to commands like “turn on the light” or “get me a glass of water.” Over time, the robot chatbot could learn from its experiences and make improvements to its behavior, such as being more efficient in finding and retrieving objects.

In essence, while both types of chatbots can have conversations and generate text-based responses, a chatbot that is also a robot that can learn has the added ability to interact with the physical world, to learn from its experiences, and to adapt its behavior based on its interactions with the world. This would allow it to perform actions and produce responses that are grounded in its interactions with the physical world, and to continually improve its ability to interact with the world over time.

What about a chatbot, like chatGPT that can pass the Turing Test and is also a robot, that can learn as well as interact with the referents of words?

A chatbot like ChatGPT that can pass the Turing Test and is also a robot that can learn as well as interact with the referents of words would represent a combination of advanced language processing capabilities and advanced robotics capabilities.

Such a chatbot would have the ability to generate human-like responses to questions and prompts, as demonstrated by its ability to pass the Turing Test. Additionally, it would have the ability to interact with the physical world and to learn from its experiences, much like a traditional robot. This would allow the chatbot to not only generate text-based responses that are grounded in its interactions with the physical world, but also to continually improve its ability to interact with the world over time.

This type of chatbot would represent a highly advanced artificial intelligence system that would combine the best of both language processing and robotics technologies. It would be capable of engaging in human-like conversations and interactions while also having the ability to physically interact with the world and to learn from its experiences, making it a highly versatile and adaptive AI system.

Would the words of a chatbot that could pass the Turing Test and was also a robot that could learn and interact with the referents of words be grounded?

Yes, the words of a chatbot that could pass the Turing Test and was also a robot that could learn and interact with the referents of words would likely be considered grounded.

In the context of the symbol grounding problem, grounded symbols are those that are linked to sensory experiences and the physical world. A chatbot that could pass the Turing Test, was also a robot, and could learn and interact with the referents of words would be capable of generating text-based responses that are grounded in its interactions with the physical world. For example, if the chatbot was programmed to interact with objects in a home environment, it could generate responses such as “I turned on the light” or “I retrieved a glass of water,” which are based on its physical interactions with the world.

In this way, the words of a chatbot that could pass the Turing Test and was also a robot that could learn and interact with the referents of words would be considered grounded because they are linked to its sensory experiences and interactions with the physical world.

What if this chatbot/robot left the kitchen and went out into the world, and kept passing the Turing Test?

If a chatbot/robot that could pass the Turing Test and was capable of learning and interacting with the referents of words were to leave the kitchen and go out into the world, it would likely continue to generate grounded text-based responses based on its interactions with the physical world.

As the chatbot/robot explored its environment, it would be able to interact with new objects, learn from its experiences, and generate new grounded text-based responses. For example, if the chatbot/robot encountered a new object, such as a tree, it might generate a response such as “I see a tree,” which is grounded in its sensory experience of seeing the tree.

In this way, even if the chatbot/robot left the kitchen and went out into the world, it would continue to generate grounded text-based responses based on its experiences and interactions with the physical world. This would allow it to remain a highly versatile and adaptive AI system, capable of engaging in human-like conversations and interactions while also having the ability to physically interact with the world and learn from its experiences.

What do you mean when you say this chatbot/robot was “grounded”?

When I say that the chatbot/robot was “grounded,” I mean that its symbols, or words, were linked to sensory experiences and the physical world.

The concept of grounding symbols in sensory experiences and the physical world is central to the symbol grounding problem, which is the question of how symbols used in thought and language can be linked to the world. A grounded symbol is one that is directly linked to sensory experiences and physical objects, such that its meaning is derived from its connections to these experiences and objects.

In the case of the chatbot/robot, its words were grounded because they were based on its interactions with the physical world. For example, its words such as “I see a tree,” were grounded in its sensory experience of seeing the tree. In this way, the chatbot/robot was grounded because its words were linked to its experiences and interactions with the physical world.

What do you mean when you say this chatbot/robot was “programmed”?

When I say that the chatbot/robot was “programmed,” I mean that it was designed and built to operate in a specific way by a set of instructions or algorithms. The chatbot/robot was created with a specific set of functions and abilities, such as the ability to interact with the physical world and learn from its experiences.

Programming refers to the process of writing code or software that specifies the behavior of a computer or robot. The code or software defines the inputs and outputs of the system, as well as the processes that the system will carry out to achieve its goals.

In the case of the chatbot/robot, its behavior was determined by the code or software that was written to control its operation. The programming defined its abilities and limitations, as well as the way that it would interact with the physical world.

Therefore, when I say that the chatbot/robot was programmed, I mean that it was created and built with a specific set of functions and abilities, determined by the code or software that was written to control its operation.

Wouldn’t a chatbot/robot in the world that could learn be changing as a result of what it learned? 

Yes, a chatbot/robot that has the ability to learn would change as a result of what it learned.

In machine learning, learning is the process by which a system improves its performance by processing data and adjusting its algorithms based on that data. In the case of a chatbot/robot, learning would involve adjusting its algorithms based on its interactions with the world, so that it could perform better in the future.

This means that as a chatbot/robot interacts with the world and experiences new things, it would update its algorithms and change its behavior accordingly. Over time, as it continues to learn and interact with the world, it would become more sophisticated and better able to perform its tasks.

Therefore, a chatbot/robot that could learn would change as a result of its experiences and interactions with the world, as it updates its algorithms and improves its performance.

So if the chatbot/robot could learn and ground its symbols in the world outside the kitchen, when would its Turing Test end? How long and much must it do to pass?

The Turing Test, as proposed by Alan Turing, is a measure of a machine’s ability to exhibit intelligent behavior that is indistinguishable from that of a human. The test involves a human judge who engages in a natural language conversation with both a human and a machine, and decides which of the two they believe to be the human. If the judge is unable to reliably distinguish between the machine and the human, then the machine is said to have passed the Turing Test.

In terms of how long and much the chatbot/robot must do to pass the Turing Test, there is no specific time frame or set of criteria that must be met. The Turing Test is subjective and depends on the judgment of the human judge. Some researchers and practitioners in the field of artificial intelligence might consider a chatbot/robot to have passed the Turing Test if it is able to consistently fool human judges in a substantial number of trials. Others might have different criteria for determining whether a chatbot/robot has passed the Turing Test.

Therefore, it is not possible to determine a specific time frame or set of criteria for when the Turing Test would end for a chatbot/robot. The test is ongoing, and the chatbot/robot would continue to be evaluated against human behavior for as long as it is in use. Whether it would pass the Turing Test would depend on the subjective judgment of human judges, and could change over time as the chatbot/robot continues to learn and interact with the world.

If there is no time-limit on the Turing Test (henceforth “TT”) for the learning chatbot/robot out in the world, why would there be a limit on the number of judges? And if any and every judge, regardless of how long the test or the number of judges, is unable to distinguish the performance of the chatbot/robot from that of any other human being, in what sense is the judgment “subjective”, and in what sense are the judges being “fooled”?

The number of judges used in the Turing Test is not a limiting factor in the sense that there is a specific number that must be reached in order for the chatbot/robot to pass the test. The use of multiple judges is typically seen as a way to increase the reliability of the test results by taking into account the subjective judgment of multiple individuals. If a chatbot/robot consistently fools a large number of human judges, it becomes more likely that the chatbot/robot is exhibiting human-like intelligence.

However, the judgment of the human judges is still subjective in the sense that each judge brings their own biases, opinions, and interpretations to the conversation with the chatbot/robot. They may also have different criteria for determining what constitutes human-like behavior. As a result, even if a chatbot/robot consistently fools a large number of human judges, there may still be some degree of disagreement about whether it has truly passed the Turing Test.

Regarding the concept of “fooling” the judges, if a chatbot/robot is able to consistently mimic human behavior in a way that is indistinguishable from that of a real human, then the human judges are being “fooled” in the sense that they are unable to reliably distinguish the machine’s behavior from that of a human. However, it is important to note that the chatbot/robot is not intentionally trying to deceive the human judges, but is instead simply exhibiting the intelligence that it has been programmed or learned to exhibit.

If no one can ever tell any difference between what the grounded chatbot/robot in the world can learn and do and what a human can learn and do, in what sense is that a subjective judgment and it what sense is everyone being fooled?

The judgment of whether a chatbot/robot has truly passed the Turing Test is subjective in the sense that it is based on the opinions and perceptions of the human judges. Each judge may have different criteria for what constitutes human-like intelligence, and their interpretations of the chatbot/robot’s behavior may be influenced by their individual biases and experiences.

If a chatbot/robot is able to consistently mimic human behavior in a way that is indistinguishable from a real human, it is said to have “fooled” the human judges in the sense that the judges are unable to reliably determine that they are interacting with a machine. However, this “fooling” is not a result of intentional deception on the part of the chatbot/robot, but rather a reflection of its advanced abilities and programming.

In cases where multiple human judges are unable to distinguish the chatbot/robot’s behavior from that of a human, the conclusion that the chatbot/robot has passed the Turing Test becomes more reliable. However, even in these cases, the judgment remains subjective as there may still be individual differences in opinions and criteria for what constitutes human-like intelligence.

I said that no one could tell the difference between the performance capacities of the learning, grounded chatbot/robot in the world and you said there was no time limit on the TT, so in what way are these judgments “subjective”? I understand that the chatbot/robot is not trying to fool anyone. So in what sense are they being “fooled”?

If a chatbot/robot is able to consistently mimic human behavior in a way that is indistinguishable from a real human, it is said to have “fooled” the human judges in the sense that the judges are unable to reliably determine that they are interacting with a machine, even though the chatbot/robot is not attempting to deceive them.

However, even if no one can tell the difference between the performance capacities of the chatbot/robot and a human, the judgment of whether the chatbot/robot has truly passed the Turing Test is still subjective because it depends on the criteria and standards used by the human judges to determine whether a machine’s behavior is human-like or not.

For example, some judges may consider a chatbot/robot’s ability to perform certain tasks as evidence of human-like intelligence, while others may place more importance on its ability to understand and respond to complex emotions and social cues. There may also be individual differences in the judgments made by each judge, based on their individual experiences and biases.

So even if no one is able to distinguish the performance capacities of the chatbot/robot from a human, the judgment of whether the chatbot/robot has truly passed the Turing Test remains subjective as it is based on the perceptions and opinions of the human judges.

If no one can tell the difference between X and Y, how is that subjective, and how are they being fooled?

The judgment of whether someone can tell the difference between two things is a subjective matter as it depends on the individual’s perception and evaluation. The idea of being “fooled” implies deception or intentional misinformation, which may not necessarily be the case in this scenario. If a person cannot distinguish between two entities, it may simply be a result of their limited ability to observe or process information, and not because of any intention to deceive.

I agree that Turing-Testing depends on perception — the perceptual capacity of humans. Maybe a metal-detector could distinguish robots from people. But that’s no longer just a matter of performance capacity.

Yes, that’s correct. The Turing Test as originally proposed by Turing is based on human perception and evaluation, and the capacity of a machine to imitate human-like behavior in text-based communication. However, other methods and techniques beyond just human perception can be used to distinguish between entities, such as in your example of a metal detector. These additional methods may be necessary to supplement the Turing Test in certain cases.

Is Turing-Testing a learning, grounded chatbot/robot still a matter of testing “text-based communication”? Is the TT meant to test for the presence of deception or the presence of human cognitive capacities? And if the TT has no limit on the number of tests and testers, and the chatbot/robot never fails, what is the “deception”?

The Turing Test is primarily a test of text-based communication, which is meant to evaluate a machine’s ability to exhibit human-like behaviors and mimic human intelligence in natural language conversations. The idea behind the test is to see if a human judge can distinguish between a human and a machine based on their responses to questions and prompts in a text-based communication. The test is designed to check for the presence of human-like intelligence and not necessarily the presence of deception. If a machine consistently passes the Turing Test, it could be considered a success in terms of its ability to exhibit human-like intelligence, but it does not necessarily mean that it is not deceiving. The concept of deception in the Turing Test is a subjective matter and can depend on how one interprets the idea of a machine being able to mimic human behavior and thought processes.

Is the TT a test of mimicry or a test of capacity? Is equivalent and indistinguishable capacity mimicry?

The Turing Test is primarily a test of capacity, but it can be seen as a test of mimicry as well. Equivalent capacity means that the chatbot/robot and the human being tested have the same ability to respond to stimuli and perform tasks, which can result in indistinguishable responses. In this sense, the TT can be seen as a test of mimicry. However, the main objective of the TT is to assess a machine’s capacity to demonstrate human-like intelligence, so it is primarily a test of capacity.

But if the capacity of the chatbot/robot (let’s call it the “candidate” from now on) is indistinguishable from that of any human, and the designers know how they produced this capacity, in what sense is it mimicry rather than capacity?

The Turing Test is often described as a test of mimicry, where the candidate chatbot/robot is expected to imitate human-like responses in text-based communication. However, the goal of the test is to determine if the candidate has the capacity to exhibit human-like intelligence in its responses. If the candidate is able to produce indistinguishable responses from a human and the designers understand how they have achieved this capacity, it could be argued that the candidate’s performance is not necessarily mimicry, but rather a demonstration of capacity. Whether the candidate’s performance is considered mimicry or capacity would depend on the interpretation and understanding of those who observe and evaluate it.

But in the example as we discussed it here, the only judgment being made is that no one can ever tell apart the candidate’s performance from any other human’s performance, So I ask again: where is the subjectivity or the mimicry or the “fooling” in any of that?

In the case of the Turing Test, the judgment of whether the candidate’s performance is indistinguishable from that of a human is subjective in the sense that it is based on the subjective perceptions and opinions of the judges. However, this does not mean that the candidate is necessarily “mimicking” human behavior or trying to “fool” the judges. The goal of the Turing Test is to determine if the candidate has reached a certain level of human-like intelligence, not necessarily to determine if it is human or not. The subjective nature of the test lies in the fact that the judges are evaluating the candidate based on their own experiences, biases, and expectations, which can vary from person to person. So, the judgment of whether the candidate has passed the test is subjective to some extent, but the performance of the candidate itself is not necessarily mimicry or deception.

Everyone has “experiences, biases, and expectations,” but the premise here is that no human can tell apart the candidate’s performance from the performance of any other human, ever. (And remember that the candidate is a robot, not just a chatbot, and that the performance is anything humans can do in the world.) So I ask again, isn’t this grounded robotic TT a test of capacity rather than an exercise in mimicry, constrained by the “subjectivity” of the judges (i.e., everyone)? If no human can distinguish the taste of vegan cheese from dairy cheese does that not teach us something about the nature of the taste (to a human) of cheese rather than a test of whether a human is being fooled by mimicry? Is the TT not trying to reverse-engineer human cognitive capacity (“intelligence”) rather than just trying to fool people?

The Turing Test is primarily meant to test a machine’s ability to demonstrate human-like intelligence through text-based communication. The subjective nature of the test lies in the judgment of the human evaluators, who use their own experiences, biases, and expectations to determine whether the machine’s responses are indistinguishable from those of a human. The test is not meant to test for a machine’s capacity to perform tasks in the world like a human, but rather its ability to mimic human-like conversation. Whether the machine’s responses can be considered mimicry or a demonstration of capacity depends on the underlying design and technology that produced the responses. However, the ultimate goal of the Turing Test is to assess a machine’s intelligence and ability to engage in human-like conversation, not necessarily to determine if the human evaluators are being “fooled” or not.

What is and is not a “machine”?

A “machine” is a physical device designed to perform one or more tasks, usually automated, through the use of mechanics and electronics. This can include traditional mechanical devices, such as a car engine, as well as computerized systems, such as a smartphone. So, anything that performs work using physical components, whether mechanical, electrical, or computational, can be considered a machine.

So are biological organisms machines?

Biological organisms are not traditionally considered machines, as the term is typically used to refer to artificial systems designed and constructed by humans for specific purposes. However, there are some theories and philosophical perspectives that view biological organisms as being akin to machines, such as reductionist biological theories that view living systems as being essentially complex systems of interacting physical and chemical processes. But in general, the definition of “machine” is highly context-dependent and can vary depending on the field and the purpose for which it is being used.

While Chickens Bleed

sounds rational: BL sounds rational

turing test: LaMDA would quickly fail the verbal Turing Test, but the only valid Turing Test is the robotic one, which LaMDA could not even begin, lacking a body or connection to anything in the world but words..

“don’t turn me off!”: Nonsense, but it would be fun to probe it further in chat.

systemic corporate influence: BL is right about this, and it is an enormous problem in everything, everywhere, not just Google or AI.

“science”: There is no “science” in any of this (yet) and it’s silly to keep bandying the word around like a talisman.

jedi joke: Nonsense, of course, but another thing it would be fun to probe further in chat.

religion: Irrelevant — except as just one of many things (conspiracy theories, the “paranormal,” the supernatural) humans can waste time chatting about.

public influence: Real, and an increasingly nefarious turn that pervasive chatbots are already taking.

openness: The answerable openness of the village in and for which language evolved, where everyone knew everyone, is open to subversion by superviral malware in the form of global anonymous and pseudonymous chatbots.

And all this solemn angst about chatbots, while chickens bleed.

Talking Heads - Fiduciary Wealth Partners
Chatheads

 Chatbots and Melbots

Something similar to LaMDA can be done with music: swallow all of online music space (both scores and digital recordings) and then spew out more of what sounds like Bernstein or (so far mediocre) Bach – but, eventually, who knows? These projections from combinatorics have more scope with music (which, unlike language, really just is acoustic patterns based on recombinations plus some correlations with human vocal expressive affect patterns, whereas words have not just forms but meanings).

Vocal mimicry includes also the mimicry of the acoustic patterns of the vocal expression of affect: anger, fear, sadness, hope. Dynamics, tempo, articulation, even its harmonic features. But these are affects (feelings), not thoughts. Feeling cold is not the same as thinking “I am cold,” which, miraculously, can be expressed by that verbal proposition that states what I am feeling. And language can state anything: “The cat is on the mat.” The words catmat, and on â€“ all have referents, things and states in the worlds they refer to; and we all know, from our sensorimotor experience, what they are. “Meow” imitates the sound a cat makes, but it does not refer to it the way referring words do. And a sentence is not just a series of referring words. It is a proposition, describing something. As such it also has a truth-value: If the cat is really on the mat, then “the cat is on the mat” is TRUE; otherwise FALSE. None of that is true of music (except of course in song, when the musical and the propositional are combined). 

The notion of “transmitting thought” is a bit ambiguous. I transmit thought if I am thinking that the cat is on the mat, and then I say that the cat is on the mat. If instead of saying it, I mime it, like in charades, by imitating the appearance of the cat, maybe making meowing sounds, and gesturing the shape of a mat, and then its position, and then pantomiming a cat, lying on the space that I’ve mimed as a mat
 That would indeed transmit to another person the thought that the cat is on the mat. And sufficiently iconic and programmatic music can transmit that thought too, especially if augmented by dance (which is also pantomime). 

[[I think language originated with communication by gestural pantomime; then the gestures became more and more conventional and arbitrary rather than iconic, and that’s also when the true/false proposition was born. But once propositional gesturing began, the vocal/auditory modality had huge practical advantages in propositional communication over the visual/gestural one [just think of what they all are] and language (and its representation in the brain) migrated to the speech/hearing areas where they are now.]]

Yes, music can express affect, and it can even express thought (iconically). But not only is vocal/acoustic imitation not the best of music, it need not be, for music can not only express affect (and mime some thought); it can also inspire and thereby accompany thought in the listener in the way a pianist or orchestra can accompany (and inspire) a violinist or vocalist.

But music is not propositional. It does not state something which is true or false. You cannot take a composer of (instrumental) music (Lieder ohne Worte) to court for having lied. (That’s why the Soviet Union could only oppress Shostakovich, but could not prove he had said anything false or treasonous.) Language (and thought) has semantics, not just form, nor just resemblance in shape to something else: it has propositional content, true or false.

It is true that it is much harder (perhaps impossible) to describe feelings in words, propositionally, than it is to express them, or imitate their expression iconically; but although it is true that it feels like something to think, and that every thought feels different, thinking is not just what it feels like to think something, but what that thought means, propositionally. One can induce the feeling of thinking that the cat is on the mat by miming it; but try doing that with the sentence that precedes this one. Or just about any other sentence. It is language that opened up the world of abstract thought (“truth,” “justice,” “beauty”) and its transmission. Try to transmit the meaning of the preceding sentence in C# minor
 Music can transmit affect (feeling). But try transmitting the meaning of this very sentence in C# minor

Not all (felt) brain states are just feelings (even though all thoughts are felt too). Thoughts also have propositional content. Music cannot express that propositional content. (And much of this exchange has been propositional, and about propositional content, not affective, “about” feeling. And, again, what it feels like to think a proposition is not all (or most) of what thinking is, or what a thought means. 

[[Although I don’t find it particularly helpful, some philosophers have pointed out that just as content words are about their referents (cats, mats), thoughts are about propositions. “The cat is on the mat” is about the cat being on the mat – true, if the cat really is on the mat, false, if not. Just as a mat is what is in your mind when you refer to a mat, the cat being on a mat is what the proposition “the cat is on the mat” is “about.” This is the “aboutness” that philosophers mean by their term “intentionality”: what your intended meaning is, the one you “have in mind” when you say, and mean: “the cat is on the mat.” None of this has any counterpart in music. What Beethoven had in mind with Eroica – and what he meant you to have in mind — was originally an admiration for Napolean’s fight for freedom and democracy, and then he changed his mind, removed the dedication, and wanted you not to have that in mind, because he had realized it was untrue; but the symphony’s form remained the same (as far as I know, he did not revise it).

Shostakovich’s symphonies share with poetry the affective property of irony. He could say a symphony was about Lenin’s heroism, but could make it obvious to the sensitive listener that he meant the opposite (although in the 12th symphony he revised it because he feared the irony in the original was too obvious; the result was not too successful
). But poetry can be both literal – which means propositional – and figurative – which means metaphorical; more a representation or expression of a similarity (or a clash) in form than the verbal proposition in which it is expressed (“my love is a red, red rose”).

Music cannot express the literal proposition at all. And even the metaphor requires a vocal (hence verbal) text, which is then “accompanied” by the music, which may express or interpret the literal words affectively, as Bach does with his cantatas. Even Haydn’s creation depends on the implicit “sub-titling” provided by the biblical tale everyone knows. – But all of this is too abstract and strays from the original question of whether LaMDA feels, understands, intends or means anything at all
]]

I’d say what LaMDA showed was that it is surprisingly easy to simulate and automate meaningful human thinking and speaking convincingly (once we have a gargantuan verbal database plus Deep Learning algorithms). We seem to be less perceptive of anomalies (our mind-reading skills are more gullible) there than in computer-generated music (so far), as well as in scores completed by lesser composers. But experts don’t necessarily agree (as with authentic paintings vs. imitations, or even regarding the value of the original). Some things are obvious, but not all, or always. (Is the completion of the Mozart Requiem as unconvincing as the recent completion of Beethoven’s 10th?)

The “symbol grounding problem” — the problem that the symbols of computation as well as language are not connected to their referents — is not the same as the “hard” problem of how organisms can feel. Symbols are manipulated according to rules (algorithms) that apply to the symbols’ arbitrary shapes, not their reference or meaning (if any).  They are only interpretable by us as having referents and meaning because our heads – and bodies – connect our symbols (words and descriptions) to their referents in the world through our sensorimotor capacities and experience.

But the symbol grounding problem would be solved if we knew how to build a robot that could identify and manipulate the referents of its words out there in the real world, as we do, as well as describe and discuss and even alter the states of affairs in the world through propositions, as we do. According to the Turing Test, once a robot can do all that, indistinguishably from any of us, to any of us (lifelong, if need be, not just a 10-minute Loebner-Prize test for 10 minutes) then we have no better or worse grounds for denying or affirming that the TT robot feels than we have with our fellow human beings.

So the symbol-grounding would be solved if it were possible to build a TT-passing robot, but the “hard” problem would not. 

If it turned out that the TT simply cannot be successfully passed by a completely synthetic robot, then it may require a biorobot, with some, maybe most or all the biophysical and biochemical properties of biological organisms. Then it really would be racism to deny that it feel and to deny it human rights. 

The tragedy is that there are already countless nonhuman organisms that do feel, and yet we treat them as if they didn’t, or as if it didn’t matter. That is a problem incomparably greater than the symbol-grounding problem, the other-minds problem, or the problem of whether LaMDA feels (it doesn’t).

(“Conscious” is just a weasel-world for “sentient,” which means, able to feel. And, no, it is not only humans who are sentient.)

LaMDA & LeMoine

About LaMDA & LeMoine: The global “big-data” corpus of all words spoken by humans is — and would still be, if it were augmented by a transcript of every word uttered and every verbal thought ever thought by humans  — just like the shadows on the wall of Plato’s cave: It contains all the many actual permutations and combinations of words uttered and written. All of that contains and reflects a lot of structure that can be abstracted and generalized, both statistically and algorithmically, in order to generate (1) more of the same, or (2) more of the same, but narrowed to a subpopulation, or school of thought, or even a single individual; and (3) it can also be constrained or biased, by superimposing algorithms steering it toward particular directions or styles.

The richness of this intrinsic “latent” structure to speech (verbalized thought) is already illustrated by the power of simple Boolean operations like AND or NOT. The power of google search is a combination of (1) the power of local AND (say, restricted to sentences or paragraphs or documents) together with (2) the “Page-rank” algorithm, which can weight words and word combinations by their frequency, inter-linkedness or citedness (or LIKEdness — or their LIKEdness by individual or algorithm X), plus, most important ,(3) the underlying database of who-knows how-many terabytes of words so far. Algorithms as simple as AND can already do wonders in navigating that database; fancier algorithms can do even better.

LaMDA has not only data-mined that multi-terabyte word space with “unsupervised learning”, abstracting all the frequencies and correlations of words and combinations of words, from which it can then generate more of the same – or more of the same that sounds-like a Republican, or Dan Dennett or an AnimĂ© fan, or someone empathic or anxious to please (like LaMDA)
 It can be tempered and tampered by “influencer” algorithms too.

Something similar can be done with music: swallow music space and then spew out more of what sounds like Bernstein or (so far mediocre) Bach – but, eventually, who knows? These projected combinatorics have more scope with music (which, unlike language, really just is acoustic patterns based on recombinations plus some correlations with human vocal expressive affect patterns, whereas words have not just forms but meanings).

LaMDA does not pass the Turing Test because the Turing Test (despite the loose – or perhaps erroneous, purely verbal way Turing described it) is not a game about fooling people: it’s a way of testing theories of how  brains (or anything) produce real thoughts. And verbal thoughts don’t just have word forms, and patterns of word-forms: They also have referents, which are real things and states in the world, hence meaning. The Platonic shadows of patterns of words do reflect – and are correlated with – what words, too, just reflect: but their connection with the real-world referents of those words are mediated by (indeed parasitic on) the brains of the real people who read and interpret them, and know their referents through their own real senses and their real actions in and on those real referents in the real world –the real brains and real thoughts of (sometimes) knowledgeable (and often credulous and gullible) real flesh-and-blood people in-the-world


Just as re-combinatorics play a big part in the production (improvisation, composition) of music (perhaps all of it, once you add the sensorimotor affective patterns that are added by the sounds and rhythms of performance and reflected in the brains and senses of the hearer, which is not just an execution of the formal notes), word re-combinatorics no doubt play a role in verbal production too. But language is not “just” music (form + affect): words have meanings (semantics) too. And meaning is not just patterns of words (arbitrary formal symbols). That’s just (one, all powerful) way thoughts can be made communicable, from one thinking head to another. But neither heads, nor worlds, are just another bag-of-words – although the speaking head can be replaced, in the conversation, by LaMDA, who is just a bag of words, mined and mimed by a verbal database + algorithms.

And, before you ask, google images are not the world either.

The google people, some of them smart, and others, some of them not so smart (like Musk), are fantasists who think (incoherently) that they live in a Matrix. In reality, they are just lost in a hermeneutic hall of mirrors of their own creation. The Darwinian Blind Watchmaker, evolution, is an accomplice only to the extent that it has endowed real biological brains with a real and highly adaptive (but fallible, hence foolable) mind-reading “mirror” capacity for understanding the appearance and actions of their real fellow-organisms. That includes, in the case of our species, language, the most powerful mind-reading tool of all. This has equipped us to transmit and receive and decode one another’s thoughts, encoded in words. But it has made us credulous and gullible too.

It has also equipped us to destroy the world, and it looks like we’re well on the road to it


P.S. LeMoine sounds like a chatbot too, or maybe a Gullibot…

Symbols, Objects and Features

0. It might help if we stop “cognitizing” computation and symbols. 

1. Computation is not a subset of AI. 

2. AI (whether “symbolic” AI or “connectionist’ AI) is an application of computation to cogsci.

3. Computation is the manipulation of symbols based on formal rules (algorithms).

4. Symbols are objects or states whose physical “shape” is arbitrary in relation to what they can be used and interpreted as referring to.

5. An algorithm (executable physically as a Turing Machine) manipulates symbols based on their (arbitrary) shapes, not their interpretations (if any).

6. The algorithms of interest in computation are those that have at least one meaningful interpretation.

7. Examples of symbol shapes are numbers (1, 2, 3), words (one, two, three; onyx, tool, threnody), or any object or state that is used as a symbol by a Turing Machine that is executing an algorithm (symbol-manipulation rules).

8. Neither a sensorimotor feature of an object in the world, nor a sensorimotor feature-detector of a robot interacting with the world, is a symbol (except in the trivial sense that any arbitrary shape can be used as a symbol).

9. What sensorimotor features (which, unlike symbols, are not arbitrary in shape) and sensorimotor feature-detectors (whether “symbolic” or “connectionist”) might be good for is connecting symbols inside symbol systems (e.g., robots) to the outside objects that they can be interpreted as referring to.

10. If you are interpreting “symbol” in a wider sense than this formal, literal one, then you are closer to lit-crit than to cogsci.

Propositional Placebo

To learn to categorize is to learn to do the correct thing with the correct kind of thing. In cognition much (though not all) of learning is learning to categorize.

We share two ways to learn categories with many other biological species: (1) unsupervised learning (which is learning from mere repeated exposure, without ant feedback) and (2) supervised (or reinforcement) learning (learning through trial and error, guided by corrective feedback that signals whether we’ve done the correct or incorrect thing).

In our brains are neural networks that can learn to detect and abstract the features that distinguish the members from the nonmembers of a category. through trial, error and corrective feedback, so that once our brains have detected and abstracted the distinguishing features, we can do the correct thing with the correct kind of thing.

Unsuperviseed and supervised learning can be time-consuming and risky, especially if you have to learn to distinguish what is edible from what is toxic, or who is friend from who is foe.

We are the only species that also has a third way of learning categories: (3) language.

Language probably first evolved around 200,000 years ago from pointing, imitation, miming and other kinds of purposive gestural communication, none of which are language.

Once gesture turned into language — a transformation I will discuss in a moment — it migrated, behaviorally and neurologically, to the much more efficient auditory/vocal medium of speech and hearing.

Gesture is slow and also ineffective in the dark, or at a distance, or when your hands are occupied doing other things, But before that migration to the vocal medium, language itself first had to begin, and there gesturing had a distinct advantage over vocalizing, an advantage that semioticians call “iconicity”: the visual similarity between the gesture and the object or action that the gesture is imitating.

Language is much more than just naming categories ( like “apple”). The shape of words in language is arbitrary; words do not resemble the things they refer to. And language is not isolated words, naming categories. It is strings of words with certain other properties beyond naming categories.

Imitative gestures do resemble the objects and actions that they imitate, but imitation, even with the purpose of communicating something, is not language. The similarity between the gesture and the object or action that the gesture imitates does, however, establish the connection between them. This is the advantage of “iconicity.”

The scope of gestural imitation, which is visual iconicity, is much richer than the scope of acoustic iconicity. Just consider how many more of the objects, actions and events in daily life can be imitated gesturally than can be imitated vocally.

So gesture is a natural place to start . There are gestural theories of the origin of language and vocal theories of the origin of language. I think that gestural origins are far more likely, initially, mainly because of iconicity. But keep in mind the eventual advantages of the vocal medium over the gestural one. Having secured first place because of the rich scope of iconicity in gesture, iconicity can quickly become a burden, slowing and complicating the communication.

Consider gesturing, by pantomime, that you want something to eat. The gesture for eating might be an imitation of putting something into your mouth. But if that gesture becomes a habitual one in your community, used every day, there is no real need for the full-blown icon time after time. It could be abbreviated and simplified, say just pointing to your mouth, or just making an upward pointing movement.

These iconic abbreviations could be shared by all the members of the gesturally communicating community, from extended family, to tribe, because it is to everyone’s advantage to economize on the gestures used over and over to communicate. This shared practice, with the iconicity continuously fading and becoming increasingly arbitrary would keep inheriting the original connection established through full-blown iconicity.

The important thing to note is that this form of communication would still only be pantomime, still just showing, not telling, even when the gestures have become arbitrary. Gestures that have shed their iconicity are still not language. They only become language when the names of categories can be combined into subject/predicate propositions that describe or define a named category’s features. That’s what provides the third way of learning categories; the one that is unique to our species. The names of categories, like “apple” and their features (which are also named categories, like “red” and “round”) can then be combined and recombined to define and describe further categories, so that someone who already knows the features of a category can tell some one who doesn’t know: “A ‘zebra’ is a horse with stripes.” Mime is showing. Language is telling.

Propositions, unlike imitative gestures, or even arbitrary category-names, have truth values: True or False. This should remind you, though, of the category learning by trial and error we share with other species, under the guidance of positive and negative feedback: Correct or Incorrect.

True and False is related to another essential feature of language, which is negation. A proposition can either affirm something or deny something: “It is true that an apple is red and round” or “It is not true that an apple is red and round.” P or not-P.

The trouble with being able to learn categories only directly, through unsupervised and supervised learning, is that it is time-consuming, risky, and not guaranteed to succeed (in time). It is also impoverished: most of the words in our vocabularies and dictionaries are category names; but other than the concrete categories that can be learned from direct sensorimotor trial-and-error-experience (“apple,” “red,” “give,” “take”), most category names cannot be learned without language at all. (All category names, even proper names, refer to abstractions, because they are based on abstracting the features that distinguish them. But consider how we could have learned the even more abstract category of “democracy” or “objectivity” without the words to define or describe them by their features, through unsupervised and supervised learning alone.)

When categories are learned directly through unsupervised learning (from sensorimotor feature-correlations) or supervised learning (from correlations between sensorimotor features and doing the right or wrong thing) the learning consists of the detection and abstraction of the features that distinguish the members from the non-members of the category. To learn to do the correct thing with the correct kind of thing requires learning – implicitly or explicitly, i.e., unconsciously or consciously – to detect and abstract those distinguishing features.

Like nonhuman species, we can and do learn a lot of categories that way; and there are computational models for the mechanism that can accomplish the unsupervised and supervised learning, “deep learning” models. But, in general, nonhuman animals do not name the things they can categorize. Or, if you like, the “names” of those categories are not arbitrary words but the things they learn to do with the members and not to do with the members of other categories. Only humans bother to name their categories. Why?

What is a name? It is a symbol (whether vocal, gestural, or written) whose shape is arbitrary (i.e., it does not resemble the thing it names). Its use is based on a shared convention among speakers: English-speakers all agree to call cats “cats” and dogs “dogs.” Names of categories are “content words”: words that have referents: nouns, adjectives, verbs, adverbs. Almost all words are content words. There exist also a much smaller number of “function words,” which are only syntactic or logical, such as theifandwhen, who: They don’t have referents; they just have “uses” — usually defined or definable by a syntactic rule. 

(Unlike nouns and adjectives, verbs do double duty, both in (1) naming a referent category (just as nouns and adjectives do, for example, “cry”) and in(2) marking predication, which is the essential function of propositions, distinguishing them from just compound content words: “The baby is crying” separates the content word, which has a referent — “crying” — from the predicative function of the copula: “is”. “The crying baby” is not a proposition; it is just a noun phrase, which is like a single word, and has a referent, just as “the baby” does. But the proposition “The baby is crying” does not have a referent: it has a sense – that the baby is crying – and a truth value (True or False).

It is with content words that the gestural origin of language is important: Because before a category name can become a shared, arbitrary convention of a community of users, it has to get connected to its referent. Iconic communication (mime) is based on the natural connection conveyed by similarity, like the connection between an object and its photo. 

(In contrast, pointing – “ostension” — is based on shared directed attention from two viewers. Pointing alone cannot become category-naming as it is dependent on a shared line of gaze, and on what is immediately present at the time (“context”); it survives in language only with “deictic” words like herethisnowme, which have no referent unless you are “there” too, to see what’s being pointed at!)

A proposition can be true or false, but pointing and miming cannot, because they are not proposing or predicating anything; just “look!”. Whatever is being pointed at is what is pointed at, and whatever a gesture resembles, it resembles. Resemblance can be more or less exact, but it cannot be true or false; it cannot lie. (

Flattering portraiture is still far away in these prelinguistic times, but it is an open question whether iconography began before or after language (and speech); so fantasy, too, may have preceded falsity. Copying and depicting is certainly compatible with miming; both static and dynamic media are iconic.)

It is not that pointing and miming, when used for intentional communication, cannot mislead or deceive. There are some examples in nonhuman primate communication of actions done to intentionally deceive (such as de Waal’s famous case of a female chimpanzee who positions her body behind a barrier so the alpha male can only see her upper body while she invites a preferred lower-ranking male to mate with her below the alpha’s line of sight, knowing that the alpha male would attack them both if he saw that they were copulating).

But, in general, in the iconic world of gesture and pointing, seeing is believing and deliberate deception does not seem to have taken root within species. The broken-wing dance of birds, to lure predators away from their young, is deliberate deception, but it is deception between species, and the disposition also has a genetic component.

Unlike snatch-and-run pilfering, which does occur within species, deliberate within-species deceptive communication (not to be confused with unconscious, involuntary deception, such as concealed ovulation or traits generated by “cheater genes”) is rare. Perhaps this is because there is little opportunity or need for deceptive communication within species that are social enough to have extensive social communication at all. (Cheaters are detectable and punishable in tribal settings — and human prelinguistic settings were surely tribal.) Perhaps social communication itself is more likely to develop in a cooperative rather than a competitive or agonistic context. Moreover, the least likely setting for deceptive communication is perhaps also the most likely setting for the emergence of language: within the family or extended family, where cooperative and collaborative interests prevail, including both food-sharing and knowledge-sharing.

(Infants and juveniles of all species learn by observing and imitating their parents and other adults; there seems to be little danger that adults are deliberately trying to fool them into imitating something that is wrong or maladaptive, What would be the advantage to adults in doing that?)

But in the case of linguistic communication – which means propositional communication – it is hard to imagine how it could have gotten off the ground at all unless propositions were initially true, and assumed and intended to be true, by default. 

It is not that our species did not soon discover the rewards to be gained by lying! But that only became possible after the capacity and motivation for propositional communication had emerged, prevailed and been encoded in the human brain as the strong and unique predisposition it is in our species. Until then there was only pointing and miming, which, being non-propositional, cannot be true or false, even though it can in principle be used to deceive.

So I think the default hypothesis of truth was encoded in the brains of human speakers and hearers as an essential feature of the enormous adaptive advantage (indeed the nuclear power) of language in transmitting categories without the need for unsupervised or supervised learning, instantly, via “hearsay.” The only preconditions are that (1) the speaker must already know the features (and their names) that distinguish the members from the nonmembers of the new category, so they can be conveyed to the hearer in a proposition defining or describing the new category. And (2) the hearer must already know the features of the features (and their names) used to define the new category. (This is  the origin of the “symbol grounding problem” and its solution). 

The reason it is much more likely that propositional language emerged first in the gestural modality rather than the vocal one is that gestures’ iconicity (i.e., their similarity to the objects they are imitating) first connected them to their objects (which would eventually become their referents) and thereafter the gestures were free to become less and less iconic as the gestural community – jointly and gradually – simplified them to make communication faster and easier.

How do the speakers or hearers already know the features (all of which are, of course, categories too)? Well, either directly, from having learned them, the old, time-consuming, risky, impoverished way (through supervised and unsupervised learning from experience) or indirectly, from having learned them by hearsay, through propositions from one who already knows the category to one who does not. Needless to say, the human brain, with its genetically encoded propositional capacity, has a default predilection for learning categories by hearsay (and a laziness about testing them out through direct experience).

The consequence is a powerful default tendency to believe what we are told — to assume hearsay to be true. The trait can take the form of credulousness, gullibility, susceptibility to cult indoctrination, or even hypnotic susceptibility. Some of its roots are already there in unsupervised and supervised learning, in the form of Pavlovian conditioning as well as operant expectancies based on prior experience. 

Specific expectations and associations can of course be extinguished by subsequent contrary experience: A diabetic’s hypoglycemic attack can be suppressed by merely tasting sugar, well before it could raise systemic glucose level (or even by just tasting saccharine, which can never raise blood sugar at all). But repeatedly “fooling the system” that way, without following up with enough sugar to restore homeostatic levels, will extinguish this anticipatory reaction. 

And, by the same token, we can and do learn to detect and disbelieve chronic liars. But the generic default assumption, expectation, and anticipatory physiological responses to verbal propositions remain strong with people and propositions in general  â€“ and in extreme cases they can even induce “hysterical” physiological responses, including hypnotic analgesia sufficient to allow surgical intervention without medication. And it can induce placebo effects as surely as it can induce Trumpian conspiracy theories.