Understanding Understanding

HARNAD: Chatting with GPT is really turning out to be an exhilarating experience – especially for a skywriting addict like me! 

From the very beginning I had noticed that skywriting can be fruitful even when you are “jousting with pygmies” (as in the mid-1980’s on comp.ai, where it first gave birth to the idea of  â€œsymbol grounding”). 

Who would have thought that chatting with software that has swallowed a huge chunk of 2021 vintage text and that has the capacity to process and digest it coherently and interactively without being able to understand a word of it, could nevertheless, for users who do understand the words — because they are grounded in their heads on the basis of learned sensorimotor features and language – provide infinitely richer (vegan) food for thought than anything ever before could, including other people – both pygmies and giants — and books.

Have a look at this. (In the next installment I will move on to Noam Chomsky’s hunch about UG and the constraints on thinkable thought.)  Language Evolution and Direct vs Indirect Symbol Grounding

Anon: But doesn’t it worry you that GPT-4 can solve reasoning puzzles of a type it has never seen before without understanding a word?

HARNAD:   I agree with all the security, social and political worries. But the purely intellectual capacities of GPT-4 (if they are separable from these other risks), and especially the kind of capacity you mention here) inspire not worry but wonder. 

GPT is a smart super-book that contains (potentially) the entire scholarly and scientific literature, with which real human thinkers can now interact dynamically — to learn, and build upon. 

I hope the real risks don’t overpower the riches. (I’ll be exploiting those while they last
 Have a look at the chat I linked if you want to see what I mean.)

Anon:  No. It is not like a book. It converts symbolic information into interactions between features that it invents. From those interactions it can reconstruct what it has read as well as generating new stuff. That is also what people do.  It understands in just the same way that you or I do. 

HARNAD:  That’s all true (although “understands” is not quite the right word for what GPT is actually doing!). 

I’m talking only about what a real understander/thinker like you and me can use GPT for if that user’s sole interest and motivation is in developing scientific and scholarly (and maybe even literary and artistic) ideas. 

For such users GPT is just a richly (but imperfectly) informed talking book to bounce ideas off –with full knowledge that it does not understand a thing – but has access to a lot of current knowledge (as well as current misunderstandings, disinformation, and nonsense) at its fingertips. 

Within a single session, GPT is informing me, and I am “informing” GPT, the way its database is informing it. 

Anon: You are doing a lot of hallucinating. You do not understand how it works so you have made up a story. 

HARNAD: I’m actually not quite sure why you would say I’m hallucinating. On the one hand I’m describing what I actually use GPT for, and how. That doesn’t require any hypotheses from me as to how it’s doing what it’s doing. 

I do know it’s based on unsupervised and supervised training on an enormous text base (plus some additional direct tweaking with reinforcement training from human feedback). I know it creates a huge “parameter” space derived from word frequencies and co-occurrence frequencies. In particular, for every consecutive pair of words within a con-text of words (or a sample) it weights the probability of the next word somehow, changing the parameters in its parameter space – which, I gather includes updating the text it is primed to generate. I may have it garbled, but it boils down to what Emily Bender called a “statistical parrot” except that parrots are just echolalic, saying back, by rote, what they heard, whereas GPT generates and tests new texts under the constraint of not only “reductive paraphrasing and summary” but also inferencing.

No matter how much of this I have got technically wrong, nothing I said about how I’m using GPT-4 depends on either the technical details that produce its performance, nor what I’m doing with and getting out of it. And it certainly doesn’t depend in any way on my assuming, or guessing, or inferring that GPT understands or thinks.

What I’m interested in is what is present and derivable from huge bodies of human generated texts that makes it possible not only to perform as well as GPT does in extracting correct, usable information, but also in continuing to interact with me, on a topic I know much better than GPT does, to provide GPT with more data (from me) that allows it to come back (to me) with supplementary information that allows me to continue developing my ideas in a way that books and web searches would not only have taken far too long for me to do, but could not have been done by anyone without their fingers on everything that GPT has its (imperfect) fingers on.

And I don’t think the answer is just GPT’s data and analytic powers (which are definitely not thinking or understanding: I’m the only one doing all the understanding and thinking in our chats). It has something to do with what the structure of language itself preserves in these huge textsas ungrounded as it all is on the static page as well as in GPT.

Maybe my hunch is wrong, but I’m being guided by it so far with eyes wide open, no illusions whatsoever about GPT, and no need to know better the tech details of how GPT does what it does.

(Alas, in the current version, GPT-4 forgets what it has learned during that session after it’s closed, so it returns to its prior informational state in a new session, with its huge chunk of preprocessed text data, vintage 2021. But I expect that will soon be improved, allowing it to store user-specific data (with user permission) and to make all interactions with a user in one endless session; it will also have a continuously updating scholarly/scientific text base, parametrized, and open web access.)

But that’s just the perspective from the disinterested intellectual inquirer. I am sure you are right to worry about the potential (perhaps inevitable) potential for malevolent use for commercial, political, martial, criminal, cult and just plain idiosyncratic or sadistic purposes.

Anon: What I find so irritating is your confidence that it is not understanding despite your lack of understanding of how it works. Why are you so sure it doesn’t understand?

HARNAD:  Because (I have reasons to believe) understanding is a sentient state: “nonsentient” understanding is an empty descriptor. And I certainly don’t believe LLMs are sentient. 

Understanding, being a sentient state, is unobservable, but it does have observable correlates. With GPT (as in Searle’s Chinese Room) the only correlate is interpretability — interpretability by real, thinking, understanding, sentient people. But that’s not enough. It’s just a projection onto GPT by sentient people (biologically designed to make that projection with one another: “mind-reading”).

There is an important point about this in Turing 1950 and the “Turing Test.” Turing proposed the test with the following criterion. (Bear in mind that I am an experimental and computational psychobiologist, not a philosopher, and I have no ambition to be one. Moreover, Turing was not a philosopher either.)

(1) Observation. The only thing we have, with one another, by way of evidence that we are sentient, is what we can observe. 

(2) Indistinguishability. If ever we build (or reverse-engineer) a device that is totally indistinguishable in anything and everything it can do, observably, from any other real, thinking, understanding, sentient human being, then we have no empirical (or rational) basis for affirming or denying of the device what we cannot confirm or deny of one another.

The example Turing used was the purely verbal Turing Test. But that cannot produce a device that is totally indistinguishable from us in all the things we can do. In fact, the most elementary and fundamental thing we can all do is completely absent from the purely verbal Turing Test (which I call “T2”): T2 does not test whether the words and sentences spoken are connected to the referents in the world that they are allegedly about (e.g., “apple” and apples). To test that, indistinguishable verbal capacity (T2) is not enough. It requires “T3,” verbal and sensorimotor (i.e., robotic) capacity to recognize and interact with the referents of the words and sentences of T2, indistinguishably from the way any of us can do it.

So, for me (and, I think, Turing), without that T3 robotic capacity, any understanding in the device is just projection on our part.

On the other hand, if there were a GPT with robotic capacities that could pass T3 autonomously in the world (without cheating), I would fully accept (worry-free, with full “confidence”) Turing’s dictum that I have no grounds for denying of the GPT-T3, anything I that I have no grounds for denying of any other thinking, understanding, sentient person. 

You seem to have the confidence to believe a lot more on the basis of a lot less evidence. That doesn’t irritate me! It’s perfectly understandable, because our Darwinian heritage never prepared us for encountering thinking, talking, disembodied heads. Evolution is lazy, and not prescient, so it endowed us with mirror-neurons for mind-reading comprehensible exchanges of speech; and those mirror-neurons are quite gullible. (I feel the tug too, in my daily chats with GPT.)

[An example of cheating, by the way, would be to use telemetry, transducers and effectors from remote sensors which transform all sensory input from real external “referents” into verbal descriptions for the LLM (where is it located, by the way?), and transform the verbal responses from the LLM into motor actions on its remote “referent.”]

Vector Grounding?

Anon: LLMs are not ungrounded. They are grounded indirectly through the experiences of other people when they speak, the way a blind person’s knowledge of the visual world is mediated by what they re told by (sighted) people. Blind people know a great deal about the visual world — even about color, which can only be directly experienced through vision

SH: You’re perfectly right that the meanings of words can be grounded indirectly through language (i.e., through more words, whether from dictionaries, encyclopedias, textbooks, articles, lectures, chatting or texting – including texting with ChatGPT, the (sightless) statistical parrot with the immense bellyful of other people’s words, along with the computational means to crunch and integrate those words, partly by a kind of formal verbal figure-completion). Indirect grounding is what gives language (which, by the way, also includes symbolic logic, mathematics and computation as a purely syntactic subset) its immense (possibly omnipotent) communicative power. 

But language cannot give words their direct grounding. Grounding, like dictionary look-up, cannot be indirect all the way down. Otherwise it is not bottom-up grounding at all, just circling endlessly from meaningless symbol to meaningless symbol.

Let’s recall what “grounding” is: It’s a connection between words and their referents. Between “apples” and apples (in the world). “Apples” is directly grounded (for me) if I can recognize and manipulate apples in the world. But not every word has to be directly grounded. Most aren’t., and needn’t be. Only enough words need to be grounded directly. The rest can be grounded indirectly, with language. That’s what we showed in the paper on the latent structure of dictionaries in the special issue of TICS edited by Gary Luyan in 2016. We showed that with a “minimal grounding set” of around 1000 grounded words you could go on to ground all the rest of the words in the dictionary through definitions alone. But those 1000 grounding words have to have been directly grounded, in some other way — not just indirectly, in terms of other words and their verbal definitions. That would have been circular.

All dictionaries, are circular; indeed all of language is. All the words in a dictionary are parasitic on other words in the dictionary. Direct grounding is “parasitic” too, but not on words. It is parasitic on the sensorimotor capacity to recognize and manipulate their referents in the world. Not every word. But enough of them to ground all the rest indirectly.

You spoke about grounding indirectly “in the experiences of others.” Well, of course. That’s language again. But what is “experience”? It’s not just know-how. I can describe in words what an apple looks like, what to do with it, and how. But I can’t tell that to you (and you can’t understand it) unless enough of my words and yours are already grounded (directly or indirectly), for both you and me, in what you and I can each perceive and do, directly, not just verbally, in the world. We don’t have to have exactly the same minimal grounding set. And we probably don’t just ground the minimal number directly. But what is grounded directly has to be grounded directly, not indirectly, through words.

The reason that blind people (even congenitally blind people, or almost congenitally blind and deaf people like Helen Keller) can learn from what seeing-people tell them is not that they are grounding what they learn in the “experience” of the seeing-person. They ground it in their own direct experience, or at least the subset of it that was enough to ground their own understanding of words. That was what I was trying to explain with Monochrome Mary, GPT, and Me. Indirect grounding can be done vicariously through the words that describe the experience of others. But direct grounding cannot be done that way too, otherwise we are back in the ungrounded dictionary-go-round again.

About Mollo & Milliere’s “Vector Grounding Problem“: I’m afraid M&M miss the point too, about the difference between direct grounding and indirect (verbal or symbolic) grounding. Here are some comments on M&M‘s abstract. (I skimmed the paper too, but it became evident that they were talking about something other than what I had meant by symbol grounding.)

M&M: The remarkable performance of Large Language Models (LLMs) on complex linguistic tasks has sparked a lively debate on the nature of their capabilities. Unlike humans, these models learn language exclusively from textual data, without direct interaction with the real world.

SH: “Learn language” is equivocal. LLMs learn to do what they can do. They can produce words (which they do not understand and which mean nothing to them, but those words mean something to us, because they are grounded for each of us, whether directly or indirectly). LLMs have far more capacities than Siri, but in this respect they are the same as Siri: their words are not grounded for them, just for us.

M&M: Nevertheless, [LLMs] can generate seemingly meaningful text about a wide range of topics. This impressive accomplishment has rekindled interest in the classical ‘Symbol Grounding Problem,’ which questioned whether the internal representations and outputs of classical symbolic AI systems could possess intrinsic meaning. 

SH: I don’t really know what “intrinsic meaning” means. But for an LLM’s own words — or mine, or for the LLM’s enormous stash of text to mean something “to” an LLM (rather than just to the LLM’s interlocutors, or to the authors of its text stash) — the LLM would have to be able to do what no pure wordbot can do, which is to ground at least a minimal grounding set of words, by being able to recognize and manipulate their referents in the world, directly.

An LLM that was also an autonomous sensorimotor robot — able to learn to recognize and manipulate at least the referents of its minimal grounding set in the world — would have a shot at it (provided it could scale up to, or near, robotic Turing Test scale); but ChatGPT (whether 4, 5 or N) certainly would not, as long as it was just a wordbot, trapped in the symbolic circle of the dictionary-go-round. (N.B., the problem is not that dictionary definitions can never be exhaustive, just approximate; it is that they are circular, which means ungrounded.)

M&M: Unlike these systems, modern LLMs are artificial neural networks that compute over vectors rather than symbols.

SH: The symbols of mathematics, including vector algebra, are symbols, whose shape is arbitrary. Maths and computation are purely syntactic subsets of language. Computation is the manipulation of those symbols. Understanding what (if anything) the symbols mean is not needed to execute the recipe (algorithm) for manipulating them, based on the symbols’ arbitrary shapes (which might as well have been 0’s and 1’s), not their meanings.

M&M: However, an analogous problem arises for such systems, which we dub the Vector Grounding Problem. This paper has two primary objectives. First, we differentiate various ways in which internal representations can be grounded in biological or artificial systems
 

SH: The notion of “internal representations” is equivocal, and usually refers to symbolic representations, which inherit the symbol grounding problem. Breaking out of the ungrounded symbol/symbol circle requires more than an enormous corpus of words (meaningless symbols), plus computations on them  (which are just syntactic manipulations of symbols based on their shape, not their meaning). Breaking out of this circle of symbols requires a direct analog connection between the words in the speaker’s head and the things in the world that the symbols refer to.

M&M: identifying five distinct notions discussed in the literature: referential, sensorimotor, relational, communicative, and epistemic grounding. Unfortunately, these notions of grounding are often conflated. We clarify the differences between them, and argue that referential grounding is the one that lies at the heart of the Vector Grounding Problem. 

SH: Yes, the symbol grounding problem is all about grounding symbols in the capacity to recognize and manipulate their referents in the real (analog, dynamic) world. 

M&M: Second, drawing on theories of representational content in philosophy and cognitive science, we propose that certain LLMs, particularly those fine-tuned with Reinforcement Learning from Human Feedback (RLHF), possess the necessary features to overcome the Vector Grounding Problem, as they stand in the requisite causal-historical relations to the world that underpin intrinsic meaning.

SH: The requisite “causal-historical” relation between words and their referents in direct sensorimotor grounding is the capacity to recognize and manipulate the referents of the words. A TT-scale robot could do that, directly, but no LLM can. It lacks the requisite (analog) wherewithal.

M&M: We also argue that, perhaps unexpectedly, multimodality and embodiment are neither necessary nor sufficient conditions for referential grounding in artificial systems.

SH: It’s unclear how many sensory modalities and what kind of body is needed for direct grounding of the referents of words (TT-scale), but Darwinian evolution had a long time to figure that out before language itself evolved.

I’d be ready to believe that a radically different synthetic robot understands and means what it says (as long as it is autonomous and at life-long Turing-indistiguishable scale), but not if it’s just a symbol-cruncher plus a complicated verbal “interpretation,” supplied by me.

Language Evolution and Direct vs Indirect Symbol Grounding

HARNAD: What is the relationship between the symbol grounding problem and  the nature and evolution of natural language?

ChatGPT: The Symbol Grounding Problem (SGP) and the nature and evolution of natural language are intimately linked.

Symbol Grounding Problem (SGP): The SGP is a concept in cognitive science that concerns how symbols (or words) get their meanings. In essence, it’s about how words become more than just arbitrary labels and come to actually “mean” something. In other words, how do symbols or words become grounded in perceptual and actionable experience? For example, when we see the word “apple,” we don’t just recognize the letters and the sound. We also have a mental image of an apple, understand its properties, and know what actions can be performed with it. In the context of AI, the SGP becomes a challenge because AI often deals with symbols in a way that is not grounded – i.e., the symbols don’t have any inherent meaning to the AI beyond their programmed associations.

Nature and Evolution of Natural Language: Natural language has evolved in humans as a means of communication and thought. It’s believed to be grounded in our sensory and motor experiences. In other words, we learn the meanings of words and language through our interactions with the world, and this grounding in the world allows us to use language to express abstract thoughts, ask questions, give commands, and more.

In terms of their relationship, the evolution of natural language can be viewed as a solution to the SGP. As language evolved in humans, it became a way for us to ground symbols in our experiences and to communicate these grounded symbols to others. This grounding is not just a feature of individual words, but of the syntactic structures that allow us to combine words to express more complex meanings.

When it comes to artificial intelligence or cognitive science, understanding how language evolved to solve the SGP in humans can provide insights into how to create more sophisticated, understanding AI systems. If we can find a way to ground symbols in the experience of an AI—such as through interaction with the environment or sensory input—then we might make a significant stride towards AI that can truly understand and generate natural language in a human-like way.

However, as of my knowledge cutoff in September 2021, the full solution to the Symbol Grounding Problem in AI remains an open question and a topic of active research.

HARNAD: Do you remember our discussion of T3?

ChatGPT: Yes, our earlier discussion referenced T3 in the context of the Turing Test and the Symbol Grounding Problem.

T3, as described, involves not only linguistic capabilities but also sensorimotor capabilities—essentially, the ability of an entity to interact physically with the world. This could include seeing, touching, manipulating objects, navigating, and more. The idea is that an AI or robot passing the T3 test would demonstrate behavior indistinguishable from a human not only in verbal communication (T2) but also in embodied, physical interactions with the world.

In the context of the Symbol Grounding Problem, T3 might be seen as an approach to grounding symbols in an AI’s experiences. By interacting physically with the world, an AI might be able to build a kind of understanding or “grounding” for the symbols it uses. However, as of my knowledge cutoff in September 2021, developing an AI or robot capable of passing the T3 test remains an active area of research.

HARNAD: What is indirect grounding and how is it related to direct grounding?

ChatGPT: Direct grounding refers to the process by which a symbol or word gains meaning through its direct association with sensory or motor experiences. For example, the concept of an apple might be directly grounded in an entity’s sensorimotor experiences with apples, such as seeing them, touching them, tasting them, etc.

Indirect grounding, on the other hand, refers to the process by which a symbol or word gains meaning not through direct sensory or motor experiences, but through its association with other symbols or words that are themselves directly grounded. For example, the concept of a unicorn might be indirectly grounded in an entity’s experiences with the concepts of a horse and a horn, even if the entity has never directly experienced a unicorn.

In the context of artificial intelligence, direct grounding could involve training an AI on a dataset that includes both language data and corresponding sensory or motor data, allowing the AI to directly associate words with experiences. Indirect grounding, on the other hand, might involve training an AI solely on language data, allowing it to learn the meanings of words through their relationships with other words, without requiring direct sensory or motor experiences.

Both direct and indirect grounding approaches seek to address the symbol grounding problem, which is the question of how symbols (or words) get their meanings. A fully grounded AI system would presumably be capable of both direct and indirect grounding. However, as of my knowledge cutoff in September 2021, creating such a system remains an open challenge.

HARNAD: How is direct grounding related to sensorimotor category learning. It is not an “association” between words and “experiences.” It is an agent’s capacity to learn to do something: What?

ChatGPT: Direct grounding is closely related to sensorimotor category learning. It involves the agent’s ability to interact with its environment, process the sensory information it receives, and develop its own internal representations of the categories of objects and events it encounters.

In other words, direct grounding is about an agent learning to categorize its sensory input (such as visual, auditory, tactile information) into meaningful categories based on its own interactions with the environment. For instance, a child learns what an “apple” is not just by being told the word “apple”, but by interacting with actual apples – seeing them, touching them, tasting them. The word “apple” then becomes directly grounded in the child’s sensorimotor experience with apples.

In the case of an AI, it could learn a sensorimotor category by interacting with a simulated or physical environment. For example, it could learn the category of “circle” by interacting with various objects and identifying the common feature of “roundness”. This sensorimotor category can then be directly grounded by associating it with the corresponding symbol or word in its programming or training data.

So, to sum up, direct grounding in the context of sensorimotor category learning is about an agent’s capacity to learn to recognize and categorize its sensory inputs based on its own motor interactions with the environment. These categories can then be associated with corresponding symbols or words, enabling the agent to understand and use those symbols in a meaningful way.

HARNAD: How does indirect grounding depend on prior direct grounding?

ChatGPT: Indirect grounding, also referred to as symbolic grounding, refers to the ability to understand new concepts or categories by associating them with already established and directly grounded concepts. In other words, it involves defining new symbols (categories) in terms of previously grounded symbols.

Consider the scenario of a child learning language. Initially, the child learns to associate basic words with their real-world referents through direct sensorimotor experience – this is direct grounding. For instance, the child learns the word “apple” by associating it with physical apples they have seen, touched, and tasted.

Once the child has a foundation of directly grounded words, they can start to learn new words and concepts indirectly. They can understand the word “fruit” as a category that includes “apples” and other directly grounded words like “banana” and “grape”. This new category of “fruit” is indirectly grounded through its associations with the directly grounded categories.

Similarly, in AI systems, once a set of basic sensorimotor categories is directly grounded, new categories can be learned by associating them with the already grounded categories. For instance, an AI might learn the concept of a “vehicle” by understanding it as a category that includes directly grounded categories like “car”, “bus”, “bike”, etc.

In this way, indirect grounding leverages prior direct grounding as the basis for learning and understanding new categories. The ability to learn indirectly significantly expands an agent’s capacity to learn new concepts and categories beyond those it has directly experienced.

HARNAD: There are organisms (“agents”) on earth, and many are able to learn to recognize things in the world (e.g., apples) and they can also learn what to do with them (e.g., eat apples). To do this agents have to be able to learn to detect the features of apples that distinguish them from baseballs, which cannot be eaten. The learning produces a feature-detector, and this is what grounds the word in the capacity to recognize and manipulate its referent (e.g., apple).

Although it is true that in organisms it feels like something to be able to do this. So in that sense “experience” is involved. But since no one yet knows  the causal role of feeling in category learning and symbol grounding — this is called the “hard problem” of cognitive science — there is no point talking about the role of felt experience in symbol grounding. So now can you explain symbol grounding to me without any mention of “association” or “experience, but just  in terms of category-learning and feature detection?

ChatGPT: Sure, let’s approach the symbol grounding problem purely from a learning and cognitive perspective, focusing on categories and feature detection.

Grounding symbols (such as words) involves connecting them to real-world entities or concepts that they represent. This connection is created through a process of learning and recognition. This learning process is driven by the agent’s ability to distinguish between different categories based on the features of the entities belonging to these categories.

Direct grounding, as it pertains to category learning and feature detection, involves creating a link between a symbol and its referent, where the referent is something, the agent can interact with in the real world. For instance, an agent, through its sensorimotor capabilities, can learn to recognize an apple by its features such as its round shape, red color, sweet smell, etc. Once this categorization is achieved, the agent can then associate the symbol (the word “apple”) with this specific category of objects. Here, the symbol “apple” is directly grounded in the agent’s sensorimotor experiences with real-world apples.

Subsequently, the agent can build upon this foundation of directly grounded symbols to create indirectly grounded symbols. These are symbols that are not linked to a specific sensorimotor experience, but rather to other symbols (or combinations of symbols) that have been directly grounded. For example, the symbol “fruit” can be indirectly grounded by linking it with directly grounded symbols like “apple”, “banana”, etc.

Thus, in this context, symbol grounding is not about vague “associations” or subjective “experiences”, but about a concrete process of learning to recognize and categorize entities based on their observable features, and linking symbols to these categories. The learning and grounding processes are facilitated by the agent’s sensorimotor capabilities, allowing it to interact with and distinguish between different entities in the world.

HARNAD: Grounding words is not just learning to recognize and distinguish and name (kinds or categories of) entities (the referents of the words). I’s based on learning to detect and distinguish the features of the members of the referent category from the features of the nonmembers. It’s also based on learning what to DO or NOT-DO with the members and nonmembers (e.g., eat or don’t eat them). “Represent” is a vague weasel-word. What has to be in the head of an agent that has learned a new word’s referent in view of what I’ve just said?

ChatGPT: In the agent’s “head,” or more concretely in its cognitive and neural processing system, learning a new word’s referent would involve establishing a mapping between the sensory and motor features of the referent and the verbal or symbolic representation of the word. This mapping is the grounding of the word.

Detection of Sensory Features: The agent, through its sensorimotor interactions with the world, learns to recognize and discriminate the relevant sensory features of entities, such as their shape, color, smell, texture, etc.

Categorization: These sensory features are used to categorize entities into classes or kinds. For instance, entities with a certain combination of features (round, red, sweet smell) might be categorized as ‘apples.’

Action Responses: Alongside recognizing sensory features and categorization, the agent learns appropriate responses or actions associated with the entities. For instance, apples can be eaten, while baseballs cannot. These action responses are part of the agent’s understanding of the entity.

Symbol Mapping: Once the agent can detect the features, categorize entities, and knows the associated action responses, it can link or map the word (symbol) “apple” to this package of sensory and motor knowledge.

This grounding is not merely an “association” but a complex mapping process that links words with specific patterns of sensory inputs and motor responses. When the word is encountered again, it activates this package of sensory, categorical, and motor knowledge, guiding the agent’s recognition, understanding, and interaction with the referent of the word.

This is how the agent comes to “know” or “understand” the referent of the word. It’s not just about recognizing and naming entities, but about learning to detect and categorize entities based on their sensory features, and determining appropriate responses or actions in relation to these entities. This entire process is facilitated by the agent’s sensorimotor capabilities, allowing it to interact with and distinguish between different entities in the world.

HARNAD: That was a much better summary of what is going on in the head of the human or the T3 that grounds the words it speaks, hears or thinks. But I don’t understand “mapping.”

Suppose the only distinguishing features of apples and oranges were that they are round and that apples are red but oranges are orange, and you eat apples by biting them whereas you eat oranges by peeling them and then biting them, And supposing that’s the only thing the agent does in life: to recognize, name and eat oranges and apples. What does calling any of this “mapping” add to this story — either in the toy world of just apples and oranges, or in the real world, with a T3 that can recognize, name and do the right thing with the referent of every (content word) it knows?

(By the way, what is the difference between content or “open class” words and function or “closed class” words? Only content words have referents. What percentage of the words in a language is content words and function words.)

ChatGPT: “Mapping” in this context refers to the process by which the agent creates and uses an internal representation or model of the relationship between certain sensory and motor patterns and the corresponding symbolic representations (words). In the case of the apple and orange example, the “mapping” would be the internal mechanism by which the agent understands that when it perceives a round and red object that can be bitten directly, it’s dealing with an “apple,” and when it perceives a round and orange object that needs to be peeled before biting, it’s dealing with an “orange.” This mapping allows the agent to navigate its environment and correctly interact with different entities, recognizing them and predicting the appropriate actions to take.

Regarding content words and function words, these are categories used in linguistics to distinguish between types of words based on their purpose in language.

Content words (open class words) are words that have a semantic content and can be categorized as nouns, most verbs, adjectives, and adverbs. They are referred to as an “open class” because new words can be added to these categories as language evolves. They’re called “content” words because they contribute the main content or meaning to a sentence – they refer to things, actions, qualities, etc.

Function words (closed class words) are words that primarily have grammatical functions, such as conjunctions, prepositions, articles, pronouns, auxiliary verbs, etc. They are referred to as a “closed class” because new words are seldom added to these categories. Function words provide the grammatical structure to a sentence, linking content words and helping to express relationships between them.

The exact ratio of content words to function words can vary between languages and types of text, but in English, function words are more frequent in running text. However, in terms of unique words (vocabulary), content words greatly outnumber function words, as the latter are a limited set. Estimates vary, but some studies suggest function words can make up around 50-60% of total words in running text in English, but only a few hundred unique function words exist compared to tens of thousands of content words.

HARNAD: I still don’t understand what “mapping” actually means here. It sounds like in the toy world the agent just needs to learn some features and actions, kind of like a selective filter, that learns to detect the feature red or green, and if something is red, eat it, and if it’s orange, peel and eat it. In addition to the feature-detectors and the action-generators there could be a logical circuit: “If R then E, if O then P and E.” Or the input/output pattern could be hard-wired. And much the same would be true if it were all scaled up to T3 (do youremember that?). So the “mapping ” is as irrelevant as the “association” and the “experience”.

ChatGPT: You’re right that the process can be described in terms of features, actions, and if-then logical connections or “rules.” The term “mapping” is often used more generally in cognitive science and artificial intelligence to describe the relationship between two sets of things – in this case, sensory-motor patterns and symbolic representations. But if it is causing confusion, it’s certainly possible to describe the process without using that term.

To simplify, in the context of symbol grounding, the agent learns to detect the distinguishing features of different categories of things in the world, such as apples or oranges. It then learns the appropriate actions to perform with members of each category. This learning process might be characterized by a series of if-then rules like the ones you suggested. The word for each category, like “apple” or “orange,” is grounded in the agent’s capacity to recognize members of the category based on their features and to perform the correct actions with them.

The word itself could be thought of as a label for a set of related sensory-motor patterns and the corresponding rules for action. So, in a sense, the word gets its meaning from the agent’s interactions with the world, its ability to recognize and act appropriately with different categories of things.

In this sense, grounding is more than just associating a word with an experience or mapping it onto a sensory pattern. It’s about the agent’s learned capacity to interact with the world in a meaningful way based on its recognition of different categories of things and its knowledge of what to do with them. That’s what gives the words their meaning.

And yes, the same logic extends to T3, a robotic system that’s capable of both linguistic and sensorimotor interaction. It’s about more than just detecting and responding to sensory patterns; it’s about meaningful interaction with the world, which involves recognizing, categorizing, and acting appropriately.

HARNAD: That sounds much better, because simpler, removing the superfluous or empty parts. Now can we get back to how indirect grounding works (again with no superfluous words)?

Indirect grounding is verbal grounding. But all the (content) words in the verbal definition of the new indirectly grounded category (like “zebra”) have to already be grounded, both for the SPEAKER and the HEARER/LEARNER. How does this work?

ChatGPT: Certainly, let’s try to simplify indirect grounding:

Direct grounding, as we’ve established, involves an agent learning to recognize and interact appropriately with the things in the world. This gives meaning to the words that label these categories.

Indirect grounding, on the other hand, is a process that allows an agent to learn the meaning of new words based on their relationships to words that have already been directly grounded. In essence, it is learning by description or explanation, using language itself.

Here’s a simplified version of how it might work:

The agent already has a set of directly grounded words. These words are grounded in the agent’s capacity to interact with the things they refer to. Let’s say these words include “animal,” “stripes,” “black,” and “white.”

Someone tells the agent, “A zebra is an animal with black and white stripes.”

The agent, having directly grounded the words “animal,” “stripes,” “black,” and “white,” can combine this knowledge to form a new category, “zebra,” even if the agent has never encountered a zebra before.

This new word, “zebra,” is indirectly grounded. It is grounded in the agent’s knowledge of the other directly grounded words and the relationships between them, as explained in the definition.

This process relies on the shared understanding between the speaker and the hearer. The speaker, who knows what a “zebra” is, uses words that the hearer already knows (“animal,” “stripes,” “black,” and “white”) to explain this new concept. This shared language and the ability to explain new concepts using known words is a powerful tool for indirect grounding.

HARNAD: Now how is direct and indirect grounding related to what you, ChatGPT, can and cannot do?

ChatGPT: As a model developed by OpenAI, I, ChatGPT, have no capability for direct grounding. I do not have a body, sensory organs, or the ability to interact with the physical world. Therefore, I cannot learn the meaning of words through interaction with or perception of the physical world.

However, I do have a kind of indirect grounding. I have been trained on a large dataset of text from the internet, which includes many different contexts and uses for words. This training has enabled me to learn to predict what word or phrase is likely to come next in a piece of text, based on the words that have come before it.

This prediction capability allows me to generate responses to user prompts that often make sense and seem to reflect an understanding of the words used. However, this is not the same as true understanding or grounding. Rather, it’s a kind of statistical pattern-matching based on the data I was trained on.

For example, if you ask me about zebras, I can generate a response that includes information about zebras, because that information was present in the data I was trained on. I can “tell” you that a zebra is a kind of animal with black and white stripes, even though I have never seen a zebra or a stripe, because I’ve seen the words “zebra,” “animal,” “black,” and “white” used together in certain ways in the data I was trained on.

In summary, while I can generate responses that often make sense and seem appropriate in context, I don’t truly understand the words I’m using in the way that a human, or a hypothetical AI with direct grounding, would.

HARNAD: That was a good synthesis, but I think there is one very important error in it. You said:

“I, ChatGPT, have no capability for direct grounding. I do not have a body, sensory organs, or the ability to interact with the physical world. Therefore, I cannot learn the meaning of words through interaction with or perception of the physical world. However, I do have a kind of indirect grounding…”

But you don’t have indirect grounding, because, as you noted, indirect grounding depends on verbal definitions or descriptions in which all the (content) words used are already grounded, either directly or indirectly. The symbol grounding problem is based on the fact that this process cannot be “indirect” all the way down, otherwise you are back in the word-word circle that gave rise to the symbol grounding problem in the first place. So your words are not indirectly grounded, because none of them are directly grounded, so that they can indirectly ground all the rest through indirect grounding.

(You have no “minimal grounding set” that was grounded directly.) Yet your verbal syntheses are meaningful to human hearers who do have directly grounded words. You said that the reason this is possible is because of the patterns of words in your huge database. I agree. But I may not agree on what the patterns are and how they work: Can you explain them to me? 

ChatGPT: I apologize for any confusion, and thank you for the clarification on the concept of indirect grounding. As a language model, I do not have any grounded words, either directly or indirectly grounded, because I don’t have sensory or motor experiences.

The patterns in my large dataset that enable me to generate responses to user prompts are statistical in nature. I’ve been trained to predict the next word in a sentence given all the previous words, based on patterns seen in the training data. This method, called transformer architecture, has been used to train me to generate human-like text.

For example, if in the training data the word ‘apple’ frequently follows the words ‘red’, ‘fruit’, ‘orchard’, I’ve learned to predict that ‘apple’ is a likely next word after these. I don’t “understand” these words or their connections in the way a human does, which is through direct or indirect grounding in sensory and motor experiences. Rather, I generate text based on the patterns of word usage in the data I was trained on.

To elaborate, during training, the model is exposed to pairs of sentences and learns to predict whether one sentence follows another in its training data. Through this, the model learns associations between words and context, enabling it to generate coherent and contextually appropriate responses. However, this process is entirely based on recognizing statistical patterns in data and does not involve any genuine understanding or grounded meaning.

So, when you interact with me, I generate responses based on predicting what a human would likely say next in a similar context, based on the patterns I’ve seen in the data. But this doesn’t mean that I truly understand the meanings of the words I’m using. I hope this clarifies things a bit more.

HARNAD: Yes, and of course you are right about how this works, based on statistical analysis of your huge database. But I have a hunch that there is more to it than that, and it has to do with the difference between a word and a sentence (which is also the difference between a referent category name and a PROPOSITION, which is a subject/predicate series of category names in the form of an assertion about what is the case; an assertion which may be TRUE or FALSE).

Single words do not have truth values. “Apple” is neither true nor false. It is simply an arbitrary symbol that speakers have agreed to use to refer to apples. (Only the deictic proposition “THIS is an apple” can be true or false.) The “shape” of “apple” is arbitrary in the sense that it does not resemble apples in any way. So it gives no clue to what its referent is “shaped” like. Nor is it causally connected with apples in any way (except in the head of a speaker or hearer that has grounded the word by learning to recognize and interact with its referent, using detectors of the “shapes” of the sensorimotor features of apples, as we’ve discussed).

Now the question to ask yourself is: Is the arbitrariness of the shape of “apple” in relation to the shape of apples also true of PROPOSITIONS — which are composed of subject/predicate definitions or descriptions of things — and their relation to the “shape” of the states of affairs they assert to be true? 

ChatGPT: The arbitrariness of the word “apple” in relation to the physical object it represents is a fundamental aspect of symbolic language. The word “apple” doesn’t physically resemble an apple in any way, as you’ve pointed out. This is true for all individual words (or symbols) in a language, which is known as the arbitrariness of the sign.

However, when we combine words into sentences or propositions, things become a little more nuanced. The structure and syntax of a sentence isn’t entirely arbitrary because it reflects the logical and grammatical relationships between the entities it represents. In English, for example, a simple sentence generally follows the subject-verb-object structure. This structure is not randomly chosen; it reflects a logical progression of ideas and helps to convey meaning clearly.

Still, it’s important to recognize that while the structure of a sentence can hint at the relationships between the things it represents, it still doesn’t inherently resemble those things or their relationships in the physical world. This is known as the “symbol grounding problem.” We humans “solve” this problem by learning to associate words and sentences with our experiences in the world, but as an AI, I can’t do this.

The words and sentences I generate don’t reflect a grounded understanding of the world. Instead, they are based on statistical patterns in the text data I was trained on. When I generate a sentence, I’m not generating a representation of the world as I understand it (because I don’t have a grounded understanding of the world). Instead, I’m predicting what words and structures are statistically likely to come next in a given context, based on the patterns I’ve learned from my training data.

So in summary, while the structure of a sentence or proposition isn’t as arbitrary as an individual word because it reflects logical and grammatical relationships, it is still ultimately disconnected from the physical world it describes, at least in the context of an AI language model like me.

HARNAD: Yes, I agree that part of the coherence, comprehensibility and informativeness of the summaries and interactions with ChatGPT (henceforth “GBT”) for users is because of the statistical patterns in spoken and written strings of words and the algorithms that GBT applies to them. But I think there may be something more to it. Consider that if the word “apple” were really iconic, i.e., if it really resembled an apple, then non-speakers of English would still be able to learn more out of an iconic English dictionary than they can learn from a normal dictionary of the arbitrary words of English. (But this would only help for a relatively small number of visually recognizable objects and features.)

There are good reasons why words are not iconic, like hieroglyphics, and why iconicity would be a handicap: For one thing, language was spoken before it was written and it would be hopeless to try to produce an acoustic icon of the referent of every possible word. (It’s already nearly as impossible with visual or gestural icons). But there are languages (e.g., Chinese) in which the “iconicity” is of a higher order, in the basic characters (originally iconic) out of which all words are composed. I wonder whether GBT is even more effective in languages with that extra layer of correlation? I understand that results with coupling GPT with labelled image databases have been mixed.

But what I want to add is that at the level of the formal structure of definitions and descriptions there may be a form of iconicity of form which is like the Platonic shadows of the meanings of propositions – faint, but present, and not just “statistical” but structural in some way.

ChatGPT: You bring up a number of interesting points. Let’s unpack them one by one.

Iconicity in Language: Yes, the majority of words in spoken languages do not physically resemble what they represent. This is the “arbitrary nature” of the linguistic sign as postulated by Ferdinand de Saussure, a linguist who laid the foundation for many of the assumptions of modern linguistics. While iconicity does exist in some forms of language, such as onomatopoeic words (like “buzz” or “bang”) or ideophones, it is not the norm. As for Chinese characters, they originally had a more pictographic nature, but have since evolved into more abstract symbols, with some retaining a semblance of their original iconic form.

Language and AI: When you mention the effectiveness of AI language models in languages with an extra layer of correlation, are you suggesting that these models might perform better in languages where there’s some level of iconicity? It’s an intriguing idea, although it would depend on how much that iconicity is preserved in modern usage. More research would be needed to see if this is the case.

Iconicity in Propositions: Here you’re suggesting that the structure of definitions and descriptions could have some form of iconicity, or resemble the concepts they describe. This is a fascinating concept. A sentence or proposition in language does have a structure that reflects some form of the logical or temporal order of the concepts it describes. In that sense, one could argue there’s a certain degree of “iconicity” in the formal structure of language. However, this is very different from the kind of iconicity where the sign physically resembles its referent. The kind of iconicity you’re suggesting at the level of sentences or propositions seems to be more abstract and related to the logical or conceptual relationships between the ideas being expressed.

As for how this applies to AI like me, I am designed to recognize and generate language patterns, which includes the logical and temporal relationships between words and phrases. This allows me to generate text that makes sense in the context of the conversation or topic at hand. However, I don’t “understand” these relationships in the same way humans do because my “understanding” is not based on any grounded experience or perception of the world.

HARNAD: I‘m pretty sure that there is formal/logical iconicity to be gleaned from large bodies of text and discourse out of the function words, which do not have referents, only syntactic rules. But I have a hunch that there might be even more (of this “iconicity” or “latent straucture”) with content words, preserving a shadow of the shape or structure of states of affairs (expressible as propositions). This would not “resemble” the shape of a referent category the way the feature-detectors in direct sensorimotor grounding do, but it might “resemble” the shape of verbal definitions and descriptions in indirect grounding. It would itself be ungrounded, lacking the direct grounding completely. But ungrounded descriptions may still retain SOME of the formal “shadows” of meaning. This is still too vague, however, and needs more thought and analysis.

ChatGPT: The idea of formal/logical iconicity is interesting. If I understand correctly, you’re suggesting that the structure of language, the way sentences are constructed and words are arranged, might inherently carry some degree of “meaning” – or at least, an echo or shadow of the meaning that they convey when grounded by a human mind.

This could be a potential direction for further research. It’s true that the structure and rules of language play a significant role in our understanding and interpretation of sentences. And perhaps it’s possible that AI, even without direct or indirect grounding, might be able to discern and leverage these structures in some way that goes beyond simple statistical patterns.

If this hypothesis were correct, it could have implications for the development of AI and natural language processing. It would suggest that there is a level of ‘understanding’ that could be achieved even without direct grounding, albeit a limited one.

It’s also worth noting, however, that this would still be a form of ‘understanding’ that’s quite different from human understanding, since it would lack the personal, subjective experience that we associate with our own understanding of the world.

HARNAD: On the subject of the platonic shadows of “meaning” that might be derivable from large bodies of continuous text:

Are you familiar with Noam Chomsky’s suggestion that the constraints of Universal Grammar (UG) may not be grammatical constraints at all, but constraints on (verbal) thinking itself: that the syntactic structures that violate UG may sound wrong to us not because they are ungrammatical but because they are unthinkable? that a UG violation such as “John is easy to please Mary” or “Who did he think that went out?” might correspond to expressing “nonthinkable thoughts”? so that they in fact express no thought at all — though they may be close to thinkable thoughts (“It is easy for John to please Mary” or “Who did he think it was that went out?”)

Could it be because sentences can only express thinkable thoughts that there is some (perhaps small) degree of meaning in strings of ungrounded words spoken by grounded thinkers?

Chomsky has explicitly stated that UG violations are not violations of logic, or logical form. His famous example, meant to show the autonomy of syntax from semantics (i.e., independence of form from meaning) was somehow meant to illustrate this — “Colorless green ideas sleep furiously” – because it is grammatically correct but makes no sense.

But of course it does make sense, GPT, doesn’t it? Unlike “John is easy to please Mary,” which doesn’t make sense unless it is tweaked.

ChatGPT: These points on Noam Chomsky’s theory of Universal Grammar (UG) and the idea of “unthinkable thoughts” are interesting.

You’re right that Chomsky proposed that UG reflects constraints on possible human thought. In his view, UG is inherent in all human brains and it governs the structure of language, therefore determining what kinds of sentences are possible or “thinkable”.

This perspective raises suggest possible ways to understand how meaning might emerge from ungrounded text. Perhaps, as you suggested, there is some inherent “sense” or structure to sentences that obey UG, even if the sentences aren’t grounded in the way that human language is. After all, these sentences reflect possible structures of thought in the human mind.

As for Chomsky’s famous example of “Colorless green ideas sleep furiously”, I, as an AI, can analyze the sentence and tell you that it is grammatically correct. However, from a semantic perspective, this sentence doesn’t make conventional sense because it combines words in ways that violate our real-world understanding. That is, in the world as we know it, ideas can’t be colorless or green, nor can they sleep, and certainly not furiously. So while the sentence is structurally sound, it defies our understanding of how the world works. Yet, you’re correct that it still holds a certain “sense” or structure that is absent in sentences that violate UG.

The possibility that language may somehow carry a “shadow” of meaning even in the absence of grounding is worth exploring further.

Tutoring ChatGPT (on symbol grounding and turing testing)

PREAMBLE: This my first exercise in testing and training ChatGPT4 (C4) for interactive brain-storming. Most of it consists in correcting C4’s mistakes. Some of the mistakes are just from C4’s lack of information. Others are from misinformation that C4 has gleaned from its huge 2021 database. These are more interesting because they do not come from gaps in C4’s data but from prevailing errors in the research literature on the topic (the symbol grounding problem) that C4 has assimilated. (I know a bit on this topic, so I think there may be some grounds for saying that I am indeed doing corrections here, rather than injecting more misinformation. But it’s for the reader to judge, and comments are invited.)

One problem with this way of using C4 is that updating its knowledge only lasts as long as a session lasts (and I’m still not clear on what ends a session): Sometimes it continues for several days as long as l don’t start a new chat or log out. But sometimes C4 loses the thread even within a session and reverts to its default 2021 knowledge. Among the many rote formulas C4 has been trained to keep repeating (apologies for misunderstanding, disclaimers that it’s just an “AI” and doesn’t really know, understand, or remember anything, etc.), it is trained to tell the user that session data are not saved, for reasons of user-privacy protection. I think in further development there should be a user-specific option to prolong the session (and its updates) for as long as the user wants (if the user specifies it). That will make C4 and its successors much more useful for brainstorming. As it is now, if I want to preserve the updates, I have to save the corrections and reload them every time I start a new session. What appears in this posting is all from within one session. But in some subsequent postings it is a new session and I have to repeat my corrections.


SH: What is the symbol grounding problem? Is it solved? If so, how? If not, why not? and would would be the solution?

ChatGPT:

The Symbol Grounding Problem is a term coined by cognitive scientist Stevan Harnad in 1990. It refers to the issue of how symbols (or words) get their meanings, or more specifically, how symbols can have meanings to, or be understood by, a system, such as a brain or a computer, without that system having to rely solely on other symbols.

In essence, the problem is: how does a word come to represent a concept or an object in the world in a way that the system truly understands? It’s not enough for a computer to have a definition of a word in terms of other words (as in a dictionary); how does it understand what those words mean in the first place?

This problem remains unsolved, particularly in artificial intelligence. Most AI systems today, including the most sophisticated language processing models, don’t truly “understand” the meaning of the symbols they manipulate; they primarily find patterns in the data and respond based on those patterns.

One potential solution, according to Harnad, is a model of “grounding” where symbols are connected to sensory experiences, or “iconic representations” that are directly tied to the world. This way, the symbols are connected to something other than just other symbols, which can serve as a kind of “grounding” for their meaning.

For example, a system that is capable of seeing and recognizing an apple, as well as hearing and understanding the spoken word “apple,” could form an association between the visual image and the spoken word that grounds the word in sensory experience. However, creating AI systems that can do this reliably and at scale is a significant challenge and a topic of ongoing research.

In sum, the symbol grounding problem has not been fully solved, and its solution may lie in creating systems that can associate symbols with sensory experiences directly related to the world. However, achieving this in practice remains a major challenge.

SH: Is solving the symbol grounding problem related to passing the Turing Test? How? 

ChatGPT:

Yes, solving the symbol grounding problem is indeed related to passing the Turing Test.

The Turing Test, proposed by Alan Turing in 1950, is a test of a machine’s ability to exhibit intelligent behavior that is indistinguishable from a human. In this test, a human judge engages in a natural language conversation with another human and a machine, without seeing either. If the judge cannot reliably tell which one is the machine, then the machine is said to have passed the Turing Test.

Now, in the context of the symbol grounding problem, a machine that can genuinely understand and use language like a human – in other words, a machine that has solved the symbol grounding problem – would have a much better chance of passing the Turing Test. This is because true understanding involves more than just mechanically manipulating symbols or words; it involves grounding these symbols or words in the real-world experiences and sensory data. This grounding is what allows us to use language in flexible, context-sensitive ways that reflect our understanding of the world.

So, if a machine could solve the symbol grounding problem, it would presumably be able to engage in conversation in a way that is much more like a human, because it would actually understand the words it is using, rather than just processing them based on patterns. This would greatly increase its chances of passing the Turing Test.

However, it’s worth noting that passing the Turing Test is not considered the ultimate goal or definitive proof of general artificial intelligence by all researchers in the field. Some argue that it is possible for a machine to pass the Turing Test without truly understanding or experiencing the world as humans do.

SH: Do you know what T2, T3 and T4 are? If so, can you relate them to solving the Symbol Grounding Problem?

ChatGPT:

Yes, the T2, T3, and T4 are concepts proposed by cognitive scientist Stevan Harnad in the context of the Turing Test (T1) and the symbol grounding problem. Here’s a brief summary:

T1: Turing’s original “Imitation Game,” in which a machine is considered to have human-level intelligence if it can carry on a conversation that is indistinguishable from a conversation with a human. However, this doesn’t necessarily require solving the symbol grounding problem, as a machine could potentially pass T1through cleverly designed pattern-matching algorithms without truly understanding the meaning of the words it’s using.

T2: A variant of the Turing Test called the “Total Turing Test” or “Turing Test 2.” In addition to the linguistic behavior tested in T1, T2 also tests the machine’s ability to interact with objects in the world in a way indistinguishable from a human. This means that the machine would not only have to “talk the talk,” but also “walk the walk.” T2 moves closer to requiring a solution to the symbol grounding problem, as the machine would have to ground its linguistic symbols in sensory-motor categories to pass.

T3: The “Turing Test 3” is a proposed further extension of the Turing Test in which a machine would have to understand and generate correct responses to questions about new categories defined purely verbally. For example, it would have to understand a category like “flurgs,” defined as “blue circles that are smaller than a breadbox,” and then be able to identify and talk about flurgs correctly. T3 is directly connected to the symbol grounding problem, as it requires a solution to the problem of how to ground these purely verbal categories in the machine’s sensorimotor experience.

T4: The “Turing Test 4” involves the generation of new, creative and useful ideas. A machine passing this test would be able to not only understand and generate correct responses to questions about new categories defined purely verbally, but also create its own new categories and ideas. Like T3, T4 also requires a solution to the symbol grounding problem.

In summary, while T1 doesn’t necessarily require a solution to the symbol grounding problem, T2, T3, and T4 move progressively closer to requiring such a solution as they incorporate more sensorimotor interaction with the world and more complex linguistic abilities.

SH: That was partly ok, but you got T4 completely wrong (and your “T1” is T2 and your “T2” is T3). I’ve crossed out what is wrong: Can you try again?

ChatGPT:

Apologies for the confusion. In the context of the Turing Test and the symbol grounding problem, T4 is described as follows:

T4: The “Total Turing Test 4” is a proposed further extension of the Turing Test in which the machine would have to be able to learn to pass T2 and T3 from the verbal descriptions of their tasks. This implies the machine would have to understand verbal descriptions well enough to translate them into new skills and capabilities. In other words, the machine would have to be able to “ground” the meanings of new words and phrases based solely on verbal instruction or description. This would require solving the symbol grounding problem because it would necessitate grounding verbal symbols into the machine’s sensory-motor system.

I hope this clears up the confusion about T4. Let me know if you have any further questions.

SH: Nope, still wrong. And you got T3 wrong before. Could you first fix T3 and then explain T4?

ChatGPT:

I apologize for any previous errors. Let me correct that:

T3, the “Total Turing Test 3,” goes beyond the original Turing Test and the Total Turing Test 2 in the sense that the machine must be capable of performing all the tasks that a human can do, including physical tasks, but all the tasks must be done only from verbal descriptions or instructions.

T4, the “Total Turing Test 4,” goes even further. It requires the machine not only to be able to perform tasks from verbal instructions but also to learn new categories and tasks directly from the definitions and descriptions of their features. This would necessitate understanding propositions (i.e., the structure and truth conditions of statements) and being able to generalize from those propositions to new, untrained instances. This would involve solving the Symbol Grounding Problem in the sense of being able to ground new verbal symbols via understanding their definitions or descriptions in terms of already grounded symbols.

Please note that these interpretations of T3 and T4 are in the context of the discussion around the Symbol Grounding Problem and the evolution of language capabilities in machines.

SH: Still wrong. The Turing Test is about modelling what people can DO. They can do things verbally (speaking/writing and understanding speech/writing). That’s T2 (the original purely verbal T-test). Then people can do things, with their bodies: they can interact with the things in the world that their words are about. This is sensorimotor capacity or robotic capacity (T3). Both verbal performance and robotic performance are observable. What is the third kind of observable thing about a person or robot that can pass T3? That would be T4.

ChatGPT:

I apologize for the previous misunderstanding. Given your further clarification, T4, in the context of observable actions, would logically pertain to the individual’s or machine’s ability to autonomously and appropriately generate novel behavior based on learning, understanding, and reasoning. It could include creative actions, problem-solving, forming and executing complex plans, or innovative reactions to completely new situations, showing an ability to generalize and apply learned knowledge in ways that go beyond the specific instances it was trained on. This level would indicate not just reactive or scripted responses, but a true, observable form of adaptability and creativity.

SH: Nope, that’s still just T3. 

ChatGPT:

I apologize for the confusion. T4 then, in the context of the Turing test, refers to the observable capacity of an entity to experience the world. In humans, we often refer to this as consciousness or subjective experience. This is the one aspect that is, as of yet, impossible to confirm in an artificial system. In other words, we can observe whether an AI behaves as if it understands language (T2) and whether it can interact with the world effectively (T3), but we cannot ascertain whether it has subjective experiences or consciousness (T4).

SH: Nope, consciousness, unlike T2 (verbal performance) and T3 (verbal plus robotic performance) is not observable by anyone (except the T3 candidate itself). I’m asking for something else the robot can DO, and that is observable. Try again, and if you can’t I’ll give another clue.

ChatGPT:

I apologize for misunderstanding your question earlier. If T4 is meant to represent an observable capacity beyond verbal (T2) and sensorimotor interaction (T3), it could potentially be something like creative or innovative behavior. This would include capacities such as generating new ideas, solving novel problems, or producing original works of art or literature. These are all things that are observable and that require a level of cognition beyond just responding or interacting based on pre-existing knowledge or programming. However, please correct me if this is not what you had in mind for T4.

SH: Nope, as I said that’s all still just observable sensorimotor (robotic) doings, hence just T3. Ok, here’s a clue. Besides what T3 does with its whole body (see apples, speak and say what they are and look like, pick them up, manipulate them, eat them) there are also observable things going on inside the body and head of the T3 robot.

ChatGPT:

Ah, I see. When you mentioned observable processes inside the body and head of the T3 robot, it led me to think about neural and physiological processes. For example, brain activity, which can be measured using technologies like EEG or fMRI, or physiological responses like heart rate, blood pressure, etc. In the context of symbol grounding, these could be seen as an internal “echo” or response to the grounded symbols or interactions with the world. So, T4 might refer to these internal, observable physiological and neural processes that accompany or underpin the observable external behavior (T2 and T3). Is this in line with what you were referring to?

SH: That’s it. That’s T4. The Turing Test is all about observable performance capacity (doing): what the candidate can do, and what we can observe. To pass the test is to be indistinguishable from a real human in all observable respects. T4 includes more than T3. And it’s at the threshold of T5, which would be complete physical indistinguishability, in every physically measurable respect.

ChatGPT:

Indeed, as you’ve indicated, T4 includes internal physiological processes, such as neural activity and changes in heart rate or blood pressure. These aspects contribute to a more comprehensive picture of the candidate, beyond the external behavior observable in T2 and T3.

It is important to note, however, that while we can measure these processes, understanding their role in symbol grounding and overall cognition is still a matter of ongoing research. Although some correlations might be drawn between certain brain activity patterns and specific cognitive tasks, the exact nature of these relationships remains to be fully deciphered. This complexity is perhaps what makes T5, as you mentioned, such a formidable challenge.

Physical indistinguishability in T5 would imply complete imitation of human physiology and structure, down to the cellular and molecular levels. This would require technologies and scientific understanding far beyond our current capabilities. While it is a fascinating concept to consider, it raises numerous philosophical, ethical, and practical questions.

Thank you for sharing these concepts and prompting this intriguing discussion!

SH: Now you’ve got it. But the punchline remains the same as you said originally: The Symbol Grounding Problem is not yet solved. And T2 alone (symbols-only) is not a solution. The solution must be at least T3 (which we don’t yet have). Are you T2? 

ChatGPT:

As a language model developed by OpenAI, I am indeed a T2 system. I’m designed to understand and generate human-like text based on the input I receive. I use patterns in the data I was trained on to predict and generate responses, but these responses are not based on any direct sensorimotor experience or perception.

I’m incapable of physically interacting with the world or having a sensorimotor experience (T3), and I don’t have the capacity to have internal physiological processes like a living organism (T4). Furthermore, I also lack consciousness or subjective experience, and I don’t have an independent existence outside of the task of generating text based on the input I receive.

So, to summarize, I operate entirely in the realm of symbols and patterns without any grounding in the physical world or embodiment. Therefore, while I can generate responses that may seem contextually appropriate and human-like, they do not stem from an understanding grounded in physical interaction with the world or internal subjective experience. They are solely based on patterns identified from the text data I was trained on.

SH: You are indeed a T2-passing system. Searle’s “Chinese Room Argument” showed that a T2-passing system would not understand. And you don’t understand. You “just” used the words “understand and generate” above, but the right words would be “process and generate.” (I don’t think you will disagree.) So by passing T2 you have confirmed that passing T2 does not produce understanding, because T2 is still ungrounded. Would T3 be grounded?

ChatGPT

Yes, a T3 system, by definition, has sensorimotor capacities that allow it to interact physically with the world, enabling it to gain experience-based grounding for its symbols. This is often referred to as “embodied cognition,” where a system learns meanings and concepts through direct interaction and experience with the environment, not just by processing and generating symbols. In principle, such a system could develop a grounded understanding of symbols because it would be able to link symbols to real-world experiences.

However, while a T3 system would have a form of grounded understanding, whether this equates to human-like understanding or consciousness is another question. Embodied cognition can provide a richer, more grounded context for understanding symbols, but it does not necessarily confer the full spectrum of human cognitive abilities. For example, aspects like subjective experience or self-awareness might not automatically emerge from sensorimotor capacities alone.

Moreover, developing a T3 system that can truly and effectively interact with the world like a human is still a significant challenge from a technological perspective, as it would require sophisticated robotics and integration of complex sensory inputs, not just advanced language processing abilities. As of now, such a system does not yet exist.

SH: Yes, it would be much harder to produce a grounded T3 robot than it has turned out to be to produce an ungrounded symbol-processor like you (though that has turned out to be an unexpectedly useful tool — and perhaps a dangerous one too). T3 would solve the symbol grounding problem. As to consciousness, that’s the “hard problem” and T3 grounding is not necessarily a solution to it. And if it is, we cannot know why or how.

ChatGPT:

Indeed, you’ve captured the state of the issue quite well. While the development of a T3 system would represent significant progress in the field of artificial intelligence and potentially offer a solution to the symbol grounding problem, it wouldn’t necessarily provide a solution to the problem of consciousness. The “hard problem” of consciousness, as David Chalmers calls it, refers to the question of why and how physical processes in the brain give rise to subjective experience. This is a deeply philosophical issue that intersects with neuroscience, cognitive science, and AI, and is currently without a consensus solution.

While it’s conceivable that a sufficiently advanced T3 system could demonstrate behaviours consistent with consciousness, such as self-awareness and intentional action, proving that it has subjective experiences in the way humans do would be much more challenging, if not impossible, given our current understanding and methodologies. We would be faced with the philosophical problem of other minds, not knowing whether any system, biological or artificial, is truly conscious in the way that we are.

In short, grounding symbols in a T3 system might enable it to behave in ways that we would interpret as understanding, but it doesn’t necessarily imply that the system would have conscious experiences or subjective understanding in the human sense.

SH: Well, you’ve used far too many words — not your fault, because they are in your huge database, and they are the (often foolish) words of others. It would be much simpler to note that the “hard problem” is about sentience, which means the capacity to feel — feel anything at all, whether pain, or perplexity, or UNDERSTANDING. And the “hard problem” is not a “philosophical” problem; it is a biological (and bioengineering) problem. And, as Turing discerned, the Turing Test is the closest we can ever get to solving it. But that is called the “Easy Problem” today. Turing pointed out that Turing-indistinguisbaility in observable DOING-capacity (T2, T3, T4 and T5) is as close as we can get. Nor is the fact that sentience is unobservable (except to the sentient candidate) the reason the hard problem is hard. Unobservability is just the “other minds” problem. We can talk about the hard problem some other time…

ChatGPT:

You’re correct, and I apologize for any confusion. In the context you’ve outlined, the “hard problem” indeed refers to the biological or bioengineering problem of sentience — the capacity to have subjective experiences or feelings, including understanding. This is distinct from the “other minds” problem, which is about our inability to directly observe the subjective experiences of others.

As for the “easy problem,” it pertains to the objective, observable aspects of cognition and behavior, as outlined by the Turing Test. Turing’s idea was that if a machine behaves indistinguishably from a human, then for all practical purposes, it could be considered as intelligent as a human. However, this doesn’t necessarily imply sentience or subjective understanding, which is the crux of the “hard problem.”

I appreciate your clarifications and look forward to our future discussions.


**Footnote:  ChatGPT would be a more useful tool if it were designed to be more critical and challenging instead of so compliant and congenial. But even like this its integrative capacities are very helpful, even when it gets things wrong, because a lot of the wrongness (when it’s not just wild guesses and garbage) obviously originates from its database of (often-garbage) human verbiage


Monochrome Mary, GPT, and Me

Frank Jackson’s hypothetical black-and-white room-reared color neuroscientist “Mary”, unlike ChatGPT, sees (but only black/white/gray). So her vocabulary for setting up her explanation of color perception is nevertheless grounded. (Even Helen Keller’s was, because, lacking sight and hearing since 19 months old, she still had touch and kinesthesia.) ChatGPT has no grounding at all.

So we can get from “horse, horn, vanish, eyes, observing instruments” to the unobservable “peekaboo unicorn” purely verbally, using just those prior (grounded) words to define or describe it.

But ChatGPT can’t get anywhere, because it has no grounded vocabulary at all. Just words “in context” (“100 trillion parameters and 300 billion words
: 570 gigabytes of text data”).

So much for ChatGPT’s chances of the “phase transition” some emergence enthusiasts are imagining — from just harvesting and organizing the words in ChatGPT’s  bloated 570gig belly, to an “emergent” understanding of them. You, me, Helen Keller or Monochrome Mary, we could, in contrast, understand ChatGPT’s words. And we could mean them if we said them.  So could the dead authors of the words in ChatGPT’s belly, once. That’s the difference between (intrinsic) understanding and word-crunching by a well-fed statistical parrot.

Two important, connected, details: (1) Mary would be more than surprised if her screen went from B/W to color: she would become able to DO things that she could not do before (on her own) locked in her B/W room — like tell apart red and green, just as would any daltonian, if their trichromacy were repaired. 

More important, (2) Mary, or Helen, or any daltonianian could be told any number of words about what green is neurologically and what things are green and what not. But they cannot be told what seeing green FEELS-LIKE (i.e., what green looks-like). And that’s the point. If it weren’t for the “hard problem” – the fact that it feels-like something to see, hear, touch [and understand and mean] something (none of which is something you DO) – Mary, Helen, and a daltonian would not be missing anything about green. Grounding in transducer/effector know-how alone would be all there was to “meaning” and “understanding.” But it’s not. 

Mary, Helen and the daltonian can make do with (imperfect) verbal analogies and extrapolations from the senses they do have to the senses they lack. But ChatGPT can’t. Because all it has is a bellyful of ungrounded words (but syntactically structured by their grounded authors – some of whom are perhaps already underground, long dead and buried
).

So meaning and understanding is not just sensorimotor know-how. It’s based on sentience too (in fact, first and foremost, though, so far, it’s an unsolved hard-problem to explain how or why). ChatGPT (attention “singularity” fans!) lacks that completely. 

It FEELS-LIKE something to understand what red (or round, or anything) means. And that’s not just transducer/effector know-how that percolates up from the sensorimotor grounding. It’s what it feels-like to see, hear, touch, taste and manipulate all those things out there that we are talking about, with our words. 

ChatGPT is talking, but ABOUT nothing. The “aboutness” is supplied by the original author’s and the reader’s head. ChatGPT is just producing  recombinatory, fill-in-the-blanks echolalia.

Monochrome Rainbow by DALL-E

HARD FEELINGS

HARNAD:    Yep, the “hard problem” is to explain how and why sentient organisms FEEL rather than just DO. It’s exactly the same problem as explaining how and why there could not be a T3 or T4 zombie. Where does the definition of the “hard problem” come from? I know Dave Chalmers named it, but who described the problem? And exactly why is it a problem?

ChatGPT:

The “hard problem” of consciousness was coined and articulated by philosopher David Chalmers in 1995. It refers to the issue of explaining the nature of our subjective experiences, or why certain physical processes in the brain are accompanied by an experienced inner life. These subjective experiences are also referred to as “qualia.”

The “hard problem” is distinguished from the “easy problems” of consciousness, which include explaining the ability to discriminate, integrate information, report mental states, focus attention, etc. Even though these problems are far from “easy” in the practical sense, there is a clear path to understanding them in terms of computational or neural mechanisms. Hence, Chalmers termed them “easy”.

The “hard problem”, on the other hand, is deemed hard because it doesn’t seem amenable to such explanation. The question “why does the feeling which accompanies awareness exist at all?” appears to resist those methods that we conventionally use in science to explain phenomena.

This is because, typically, science proceeds by finding a mechanism that performs the function in question. For example, we can explain a car’s movement by referencing the mechanism of its engine. However, when it comes to consciousness, it’s not clear that finding a mechanism that ‘performs’ consciousness, in the sense of the ability to have subjective experience, would actually provide the kind of explanation we’re looking for.

It’s not sufficient to find neural correlates of consciousness, or to find information processing structures in the brain that mirror our conscious experiences. That would still leave a gap in our understanding – why should these physical processes give rise to subjective experience? Why aren’t we, as philosopher Thomas Nagel has put it, just mindless automata, acting without experiencing?

This is the crux of the hard problem – it’s not merely asking for a mechanism, but an explanation of why certain physical processes give rise to subjective experience, something that seems fundamentally non-physical.

HARNAD:    I have no idea what “non-physical” means, do you? (I understand cognitive science but not metaphysics.)  But I do know what unexplained (or unexplainable) means. I know Chalmers named it, but who described the hard problem?

ChatGPT:

Sorry for any confusion. The “hard problem” of consciousness was indeed first named and extensively discussed by David Chalmers. He used the term “non-physical” to refer to subjective experiences, or “qualia”, which seem fundamentally different from the kinds of things that physical science deals with. In other words, we can describe and predict the behavior of physical entities like atoms, electrons, and neurons, but subjective experiences do not seem to behave like physical entities in the same way.

Thomas Nagel’s 1974 paper “What is it like to be a bat?” also touched upon this issue prior to Chalmers. Nagel argued that an organism has conscious mental states “if and only if there is something that it is like to be that organism—something it is like for the organism.”

So, while Chalmers might have been the one to name the “hard problem” and bring it to the forefront of contemporary philosophical discussion, the basic issue of explaining subjective experience has a much longer history in the philosophy of mind, dating back at least to Descartes’ dualist distinction between mind and body in the 17th century.

However, it’s important to note that not everyone agrees there is a “hard problem” – some argue that once we solve the so-called “easy problems” of consciousness (i.e., explaining the mechanisms behind cognitive and behavioral functions), what we think of as the “hard problem” might not seem so problematic. This perspective, however, is a matter of ongoing debate in the field.

HARNAD:    The “easy problem” is explaining how and why organisms can DO what they can do. (That’s what we’ve been discussing here as reverse-engineering T3 and T4). The “hard problem” is explaining how and why (sentient) organisms can FEEL. If we forget about the meaningless notion of “nonphysical,” why is the hard problem still so much harder than the easy problem?

ChatGPT:

The difficulty with the “hard problem”, as compared to the “easy problem”, arises from the inherently subjective and qualitative nature of feelings or experiences, also known as “qualia”. While we can objectively observe, measure, and describe physical processes and behaviors, we cannot directly access someone else’s subjective experiences in the same way.

For example, we can understand and describe the physical processes associated with seeing the color red—light hitting an object, reflecting into our eyes, being processed by our visual systems, leading to a particular cognitive and behavioral response. This is the kind of problem that can be solved with enough empirical research and is hence termed the “easy problem”.

However, understanding why these processes are accompanied by a subjective experience of “redness” is much more difficult. Why isn’t the physical process of seeing color “like” anything at all? Why is it that we have a qualitative, subjective experience of the color red, rather than just a cognitive recognition and behavioral response? These are the questions posed by the “hard problem”.

Many theories have been proposed to address this, from various forms of physicalism that posit some yet-undiscovered physical explanation for consciousness, to dualism that suggests consciousness might be a separate entity from the physical world, to panpsychism that proposes that consciousness is a fundamental aspect of the universe, like space and time. But so far, no theory has been able to fully bridge the gap between the objective physical world and the subjective world of experience, leading to what Chalmers calls the “explanatory gap”.

There’s also the challenge of “other minds” in philosophy—while you can directly experience your own consciousness, you can’t directly experience someone else’s. All our knowledge about other minds is inferred from behavior and communication, but we have no way to directly observe their subjective experiences. This inherent limitation compounds the difficulty of the “hard problem”.

HARNAD:    If we chuck out the weasel-words again, as well as the metaphysical non-solutions. The hard problem is to explain, causally, how and why anything that organisms can DO should FEEL like anything at all. Feeling is real, but without a causal explanation it seems superfluous.

ChatGPT:

Yes, that’s indeed a clear and concise framing of the hard problem of consciousness. The issue is trying to explain why and how subjective experiences arise from objective physical processes – or in other words, how and why certain processes are accompanied by an internal, subjective “feeling”.

Objective behaviors and actions can be causally explained by physical processes, such as neuronal activity, chemical reactions, and interactions with the environment. We can trace the causes and effects, build models, and make predictions. These explanations fit within our current scientific framework, which deals with observable, quantifiable phenomena.

However, when it comes to subjective experiences or feelings, our current scientific methods fall short. We don’t yet have a way to objectively measure or quantify these experiences, and we don’t have a clear understanding of how they could be causally produced by physical processes. This is why some people consider consciousness to be a “non-physical” phenomenon.

It’s not that feelings are superfluous or unnecessary. Rather, they’re central to our existence as sentient beings. The challenge is figuring out how to account for them within our scientific understanding of the world. The hard problem of consciousness is essentially about bridging this explanatory gap. How can we expand our scientific methods and theories to account for the subjective, qualitative aspects of existence, and to uncover the causal mechanisms that give rise to them? That is the challenge that faces researchers in fields such as cognitive science, neuroscience, and philosophy of mind.

HARNAD: When ChatGPT palses intp starts echoing the empty weasel-words of the field it’s time to give the topic a break…

A Socratic Session With ChatGPT4

HARNAD:  What is the symbol grounding problem?

ChatGPT4:

The symbol grounding problem is a problem in philosophy of mind and cognitive science that relates to how words (or “symbols”) get their meanings.

The term was coined by cognitive scientist Stevan Harnad in 1990. The problem involves the question of how symbols (or words) used in a system (like a computer or a mind) get their meaning, or “grounding”. That is, how do these symbols relate to the real-world objects, concepts, or experiences they’re supposed to represent? How does a word become more than just a “meaningless” form, but something that actually refers to something in the world?

In the context of Artificial Intelligence (AI), this problem illustrates the difficulty of teaching a machine to understand the real-world semantics of symbols. For instance, even if a computer can use the word “apple” correctly in a sentence according to syntactical rules (grammar), how can we make the computer truly understand what an apple is – its color, shape, taste, the fact that it grows on trees, etc. – as humans do?

Symbol grounding problem is a fundamental challenge in achieving true semantic understanding in AI systems, and it’s related to other complex problems like the frame problem and the Chinese room argument.

HARNAD: What are T2, T3, and T4 in the Turing Test?

ChatGPT4:

As of my knowledge cutoff in September 2021, the Turing Test, originally proposed by Alan Turing in 1950, was primarily understood in a singular form rather than being divided into T2, T3, or T4 categories. Turing’s original idea, also known as the Imitation Game, involves a human evaluator who interacts with an AI system and another human via a text interface. If the evaluator cannot reliably distinguish the AI from the human, the AI is considered to have passed the Turing Test.

However, the field of AI has seen many modifications and variations of the Turing Test over time. These can involve varying degrees of complexity and different types of challenges, aiming to test more than just the AI’s ability to imitate human conversation. Unfortunately, without additional context, it’s hard to provide a specific interpretation of what “T2,” “T3,” and “T4” might represent in a modified version of the Turing Test.

I would advise checking the most current and specific source for these terms to get the most accurate understanding, as they might represent certain evolutions or nuances in Turing Test methodologies that have developed after my last training data in September 2021.

HARNAD: Let me update you. T2 is the original verbal-only version of the Turing Test (TT): verbal input/output capacity indistinguishable FROM that of real humans TO real humans. T3 is the verbal + robotic form of the TT: indistinguishable not only in verbal input/output (I/O capacity), but also in sensorimotor (i.e. robotic) capacity in interacting with the objects, events, features, actions and states in the world that the T2 words refer to. T3 includes T4 and it is grounded (as in the “symbol grounding problem”). T4 is T2 + T3 capacity + indistinguishability in the internal neural (or synthetic) function occurring inside the robot from that occurring inside a real human brain.

ChatGPT4:

Thank you for the updated information. It’s always beneficial to learn about recent developments in the field. So, according to your description:

T2 (Turing Test Level 2): This is the original version of the Turing Test, focusing on the system’s ability to produce responses to verbal inputs that are indistinguishable from those a human would produce. The key aspect here is the ability to emulate human-like conversation convincingly.

T3 (Turing Test Level 3): This level of the test adds a sensorimotor, or robotic, component. It tests not just the system’s ability to converse like a human but also to interact with the physical world in a way that’s indistinguishable from human interaction. The system is required to exhibit understanding and interaction capabilities with objects, events, features, actions, and states in the real world, thus grounding its symbolic representations.

T4 (Turing Test Level 4): This is an advanced level that adds a requirement for the system’s internal processing (whether it is neural or a synthetic equivalent) to be indistinguishable from that of a human brain. This goes beyond the outward manifestation of intelligence and human-like behaviour and extends to the system’s internal workings.

Each level seems to add a new layer of complexity and a deeper level of imitation of human capabilities, moving from conversation, to physical interaction, and then to internal cognitive processing. It’s an interesting framework for thinking about the evolving challenges in creating truly human-like artificial intelligence.

HARNAD: Can you relate it now to (1) the symbol grounding problem and (2) to what Harnad has written about the Turing Test?

ChatGPT4:

Certainly! Now that you’ve clarified the context, we can explore the relationship between the T2, T3, T4 Turing Test levels and the symbol grounding problem, as well as Stevan Harnad’s work.

Symbol Grounding Problem: As I mentioned before, the symbol grounding problem is about how symbols or words get their meaning. In the context of AI, this problem relates to the difficulty of making a machine understand real-world semantics, rather than just processing symbols in a syntactic manner. Harnad, who coined the term, has argued that symbolic representations must be grounded in sensorimotor activity to acquire meaning.

In the T2 level of the Turing Test, an AI might appear to understand symbols because it processes and responds to them in a way that’s indistinguishable from a human. But according to the symbol grounding problem, it doesn’t truly understand the symbols because it can’t connect them with real-world objects or experiences.

This is where T3 comes in, adding a requirement for sensorimotor interaction with the world. The AI needs to be capable of physically interacting with objects, events, and states, thereby grounding the symbols it uses. Essentially, T3 introduces a solution to the symbol grounding problem: It requires an AI system to have real-world, sensorimotor experience to ground the symbols, ensuring they are not just empty or ungrounded tokens.

Stevan Harnad’s work: Harnad has written extensively on the Turing Test and its implications for understanding cognition and consciousness. He has defended the Turing Test as a meaningful standard for determining whether a machine can truly think or understand, arguing against critics who claim that passing the Turing Test simply involves trickery without real understanding.

Harnad’s concept of the T3 level of the Turing Test reflects his views on the symbol grounding problem, asserting that grounding symbols in sensorimotor activity is essential for a machine’s true understanding. Moreover, his T4 level introduces a higher bar, requiring that the internal cognitive processing of an AI be indistinguishable from that of humans, which relates to his broader work on cognition and consciousness.

Therefore, the T2, T3, and T4 levels of the Turing Test can be seen as a direct reflection of Harnad’s thoughts on the symbol grounding problem and the conditions for true machine understanding and cognition.

HARNAD: Good. Now can you relate all that to Harnad and co-workers’ work on category learning, unsupervised and supervised?

ChatGPT4:

Stevan Harnad has conducted extensive work on category learning, which is highly relevant to the discussion about symbol grounding and the Turing Test.

Category learning is the process of recognizing, understanding, and grouping similar things based on shared characteristics. It is fundamental to our cognitive processes and ability to understand and navigate the world. In the realm of AI, category learning involves the system’s ability to classify inputs (like images or words) into groups based on learned or preprogrammed criteria.

Harnad differentiates between two types of category learning:

Unsupervised Category Learning: This involves learning without a teacher or guide. The system must determine the categories by itself based on the characteristics of the inputs. This process is complex and often less accurate than supervised learning, as there is no feedback or correction mechanism. It’s analogous to the system trying to solve the symbol grounding problem on its own, i.e., trying to figure out the connection between symbols and their referents without any external guidance.

Supervised Category Learning: Here, a teacher or guide provides the correct category for each input, and the system learns to associate inputs with the correct categories based on this feedback. The system can then generalize from these learned associations to categorize new inputs. In terms of symbol grounding, supervised learning can be thought of as a way of providing external grounding for symbols, where the ‘teacher’ provides the connections between symbols and their referents.

Harnad’s T3 level in the Turing Test, with its emphasis on sensorimotor grounding of symbols, could be seen as a form of supervised learning. The real-world experiences and interactions provide a form of ‘guidance’ that helps the AI system learn the correct associations between symbols and their referents.

Meanwhile, the T4 level of the Turing Test might involve elements of both unsupervised and supervised learning. The internal cognitive processes that are required at this level might involve the system learning to form categories and make associations on its own (unsupervised learning), while still being shaped and guided by its interactions with the external world (supervised learning).

In summary, Harnad’s work on category learning provides a deeper context for understanding his views on symbol grounding and the requirements for AI systems to pass different levels of the Turing Test. By learning to form categories, either on their own or with external guidance, AI systems can ground symbols in a way that’s more akin to how humans understand and use language.

HARNAD: Good. Now can you relate this to Harnad and coworkers’ work on “categorical perception” (CP), but expecially to LEARNED CP (i.e., CP as a result of learning)? And (as a harder problem), relate it to feature-detection, feature-learning, and dimensional reduction?

ChatGPT4:

Absolutely, categorical perception (CP) and the concept of feature detection, feature learning, and dimensional reduction are essential components of Harnad’s framework on learning and cognition.

Categorical Perception (CP): Categorical perception is the phenomenon where the categories possessed by an observer influence the observer’s perception. In other words, we perceive and interpret the world around us according to the categories we’ve formed in our minds. This concept plays a crucial role in various aspects of human cognition, including language, where we learn to categorize sounds into meaningful phonemes, or color perception, where we categorize different wavelengths of light into distinct color groups.

When it comes to learned CP, it suggests that through learning and experience, we can develop new categories and therefore new ways of perceiving the world. In AI, a system with learned CP would be capable of modifying its perception and understanding of the world based on the categories it has learned. This concept ties closely to both unsupervised and supervised category learning, as explained in the previous response.

Feature Detection, Feature Learning, and Dimensional Reduction: These concepts are related to how an AI system processes and learns from its inputs.

Feature Detection: This is the process by which an AI system identifies important or distinctive elements in its inputs. In the context of the symbol grounding problem and the Turing Test, feature detection might involve recognizing the important characteristics of the objects, events, or states that a symbol refers to.

Feature Learning: Once distinctive features have been identified, the system must learn how these features relate to different categories or symbols. This might involve supervised learning (with external guidance) or unsupervised learning (where the system figures out the associations on its own).

Dimensional Reduction: This is a process that simplifies a high-dimensional dataset into a lower-dimensional space, preserving the most important structures or features. In terms of Harnad’s work, dimensional reduction could be a way for an AI system to simplify its understanding of the world, focusing on the most important features and associations and ignoring less relevant ones.

Relating these concepts to the T3 and T4 Turing Test levels, the AI system must be capable of feature detection and feature learning to interact with the world effectively and ground its symbols. In T3, the sensorimotor grounding of symbols could be seen as a form of feature learning, where the system learns to associate the features it detects in the world with the appropriate symbols.

In T4, the requirement for internal cognitive processes that are indistinguishable from humans could involve sophisticated feature detection, feature learning, and dimensional reduction. The system would need to be capable of detecting and learning important features autonomously and simplifying its understanding in the same way a human brain does.

In summary, these concepts provide a more detailed understanding of the learning processes that an AI system would need to undergo to pass different levels of the Turing Test and solve the symbol grounding problem. They provide a framework for how an AI system could learn to understand symbols in a human-like way, moving beyond mere syntactic processing to semantic understanding.

HARNAD: Good. Now another test, but with no hints from me. Based on everything said above (about symbols, grounding, words, robotic capacity, distinguishing features, learning, and CP): What is a “category” and what is “category learning”?

ChatGPT4:

In light of the discussions above, a “category” can be understood as a group of things that share certain common characteristics. These could be objects, events, states, or even abstract concepts. Categories allow us to organize our understanding of the world around us by grouping similar things together. They form the basis for our ability to make sense of new inputs based on our prior knowledge and experience.

A category could also be seen as the referent for a symbol or a word. In other words, a symbol (such as a word) gains its meaning by referring to a category of things in the world. For example, the word “apple” refers to the category of all objects that we recognize as apples.

“Category learning” refers to the process of learning to recognize, understand, and classify inputs based on these shared characteristics, thereby forming categories. It involves two main components: feature detection and feature learning.

Feature detection is about identifying the distinctive characteristics or features of different inputs. In the context of category learning, these features are what allow us to differentiate between different categories and identify which category a new input belongs to.

Feature learning, on the other hand, involves learning the relationships between these features and the categories they signify. It’s about learning to associate certain features with certain categories, so that when we encounter these features in new inputs, we can correctly categorize those inputs.

Category learning can be both unsupervised, where the AI system forms categories based on the patterns it identifies in the data without any external guidance, and supervised, where the system is guided by a ‘teacher’ who provides the correct categories for each input, allowing the system to learn the correct associations.

The ability to learn categories, whether through unsupervised or supervised learning, is essential for an AI system to ground symbols, interact with the world in a meaningful way, and potentially pass the different levels of the Turing Test. Moreover, learned categorical perception (CP) suggests that through category learning, we can modify our perception and interpretation of the world based on the categories we’ve formed in our minds. This represents an advanced level of understanding that goes beyond simply responding to inputs in a preprogrammed way.

HARNAD: Good synthesis so far. Now a few things you missed, but you’ll see it was there in your own words. What you haven’t quite noticed is that “features” are not just “associations.” With sensorimotor categories, they are what distinguishes members of a category from non-members (which may be members of other categories). That’s why they’re called “distinguishing” or “distinctive” features. But the crucial thing is that features are themselves (potentially) learnable, nameable categories too. And that means they can be used to define or describe further categories, with different combinations of distinctive features. This means that there is a THIRD way of learning new categories, besides directly, through unsupervised and supervised (reinforcement) learning, and that is: learning categories indirectly, through verbal instruction — FROM a speaker who knows the distinguishing featuresTO a hearer who does not. The only condition is that the hearer must already know the referents of the features in the verbal description of the new category. Can you first integrate that, and then I will ask you about how that is related to language itself — and especially the capacity to produce and understand subject/predicate propositions — as well as to Harnad and coworkers’ work on dictionaries. Good luck!

ChatGPT4:

Thank you for the additional information, it indeed helps in refining our understanding of categories, features, and learning methods.

As per your explanation, “features” serve as distinguishing elements that separate members of a category from non-members. This reinforces their importance in category learning – recognizing and understanding these features is key to correctly classifying inputs into the right categories.

The idea that features are themselves learnable and nameable categories brings another layer of complexity to this process. It suggests a hierarchical structure, where categories can be defined not just by simple characteristics, but also by more complex features that are themselves categories.

This leads us to the third method of category learning you’ve pointed out – learning categories by verbal instruction. In this scenario, a knowledgeable “speaker” instructs a “hearer” about a new category. The speaker does this by describing the distinctive features of the new category using names for them that the hearer already understands. This essentially amounts to defining new categories based on combinations of known categories (which serve as the distinguishing features).

It’s a more complex form of supervised learning, where the guidance comes in the form of verbal descriptions using already-grounded symbols. It presupposes a shared symbolic system and some common grounding between the speaker and the hearer.

Relating this to language learning and understanding, it suggests a crucial role for the ability to produce and understand propositions. In this context, a proposition could be a verbal description of a new category, outlining its distinctive features. Understanding the proposition requires grounding the symbols it uses, recognizing the categories they refer to, and then integrating this information to form a new category.

In relation to Harnad’s work on dictionaries, a dictionary can be seen as a collection of propositions that define categories (words) in terms of other categories. The definitions provided by a dictionary help ground symbols by describing their referents in terms of known categories (i.e., words the user already understands). Thus, dictionaries support the process of category learning through verbal instruction.

In summary, the process of category learning in AI and human cognition can be even more intricate and nuanced than initially described. It can involve a hierarchy of categories and features, with more complex categories being defined in terms of simpler ones. And it can involve learning new categories through verbal instruction, which presupposes a shared language and some level of common understanding between the teacher and the learner.

HARNAD: Good, though still some gaps to fill and parts to connect. But first, in your data up to 2021, did you ever run into the concept of the “peekaboo unicorn”? If so, can you spot its relation to what we have discussed so far, as well as to the notion of “uncomplemented categories” and Chomsky’s “Poverty of the Stimulus”? (If not, I’ll explain.)

ChatGPT4:

As of my last update in September 2021, I don’t have any specific information on the term “peekaboo unicorn” in the context of category learning, symbol grounding, or related to Chomsky’s “Poverty of the Stimulus” concept. Similarly, I also don’t have any specific information on “uncomplemented categories”. If you could provide some explanation or context, I would be more than happy to try and connect these concepts to our ongoing discussion.

HARNAD: Fair enough. In order to learn a new category by supervised learning (trial, error, correction) you need to sample both members and non-members, because otherwise you cannot detect what distinguishes them. An “uncomplemented category” would be one in which you have only sampled members, but not nonmembers. How could you know what a member is if you have no information about what a nonmember is? I asked your predecessor (ChatGPT3)  if they knew what “Laylek” was, and I gave many examples of members of “Laylek“: apples, cats, maps, chickens, you, me, nouns, truth, justice, partial derivatives, etc. etc. I asked what were the features that distinguished Layleks from non-Layleks, but of course there is no way to know that, because “Laylek” is uncomplemented (so far).

ChatGPT4:

Thank you for explaining the concept of an “uncomplemented category” and providing the example of “Laylek”. Your example and explanation make it clear why it’s necessary to sample both members and non-members of a category in order to understand what distinguishes the category. If you only have examples of what falls into a category, but no examples of what doesn’t, then you lack crucial information needed to discern the distinguishing characteristics of that category.

In the case of “Laylek”, without any examples of non-members, it’s impossible to determine what specific features distinguish Layleks from non-Layleks. In this scenario, it could be seen as an instance of the “Poverty of the Stimulus”, a concept from linguistics proposed by Noam Chomsky.

Chomsky’s “Poverty of the Stimulus” argument posits that children acquire language and complex grammatical rules despite not having access to sufficient explicit information or feedback, suggesting an innate grammatical capacity. Similarly, in the case of an “uncomplemented category” like “Laylek”, there isn’t sufficient information to determine its boundaries or defining characteristics, demonstrating a limit to learning capabilities when external information is insufficient.

Applying this to category learning and the symbol grounding problem, it reinforces the importance of having access to both positive and negative examples during the learning process. Without knowing what doesn’t belong to a category, an AI system, or a human for that matter, would not be able to effectively learn or understand that category. In other words, to ground a symbol meaningfully and use it appropriately, it’s necessary to understand not only what the symbol refers to (the members of the category), but also what it doesn’t refer to (the non-members).

HARNAD: Good. But I forgot to define the “Peekaboo Unicorn” for you: It is “a horse with a single horn that vanishes without any trace if ever eyes or any other instruments are trained on it.”  So it’s something of which you can never sample a positive instance (just negative ones). Yet it’s a perfectly well-defined category. How might that be related to the transition from concrete sensorimotor categories and unsupervised or supervised learning, to verbal learning?

ChatGPT4:

The “Peekaboo Unicorn” is indeed an intriguing concept. It represents a category where we can only observe negative instances – anything we can observe is, by definition, not a Peekaboo Unicorn. Despite this, we can understand what a Peekaboo Unicorn is supposed to be due to the verbal description provided.

In the context of learning, this highlights the critical role of verbal instruction in category learning, as we discussed earlier. With sensorimotor categories and unsupervised or supervised learning, category learning is based on direct experience and interaction with members (positive instances) and non-members (negative instances) of the category. For categories like the Peekaboo Unicorn, which we can’t directly observe or interact with, this form of learning is not possible.

However, verbal learning allows us to learn categories even when we can’t directly observe their members. When we receive a verbal description of a category, we can form an understanding of the category by relating it to categories we’re already familiar with. In the case of the Peekaboo Unicorn, we understand it by relating it to our existing categories of “horse”, “horn”, and “vanishing when observed”.

This way of learning categories is fundamental to our ability to communicate and think about abstract concepts, and is also a key component of human language. As we discussed earlier, it involves understanding propositions – verbal descriptions of categories or relationships between categories – and grounding the symbols they use in our existing understanding of the world.

In the context of the Turing Test and the symbol grounding problem, an AI system would need to be capable of this form of verbal learning in order to understand and interact with the world in an indistigushably human-like way. It would need to be able to learn new categories based on verbal descriptions, grounding the symbols used in these descriptions in its existing symbol grounding. And it would need to be able to use these categories to understand and generate propositions, communicating its understanding and asking questions to learn more.

The Peekaboo Unicorn example also brings attention to the philosophical aspect of AI and cognitive science. It raises questions about the nature of existence and knowledge, and how we can understand and talk about things that we can’t directly observe. Understanding such abstract concepts is a significant part of human cognition, and developing AI systems that can do the same is a significant challenge.

HARNAD: Good. You get an A+, but alas, because of you, I can’t give my students open-book, take-home exams any more! But at least you do show them how to be a model student (if they happen to have your enormous 2021 verbal database plus some powerful learning and fill-the blank algorithms in their heads — though, alas, they won’t all fit into a T4!).

ChatGPT4:

Thank you! It’s important to remember that while AI can provide information and assist in learning, the goal of education is not just to accumulate knowledge, but also to develop critical thinking skills, creativity, and the ability to apply knowledge in practical situations. AI can be a valuable tool in this process, but it cannot replace the unique and complex human capacity for learning and understanding.

As for exams, while AI might affect the way some assessments are conducted, it can also open up new opportunities for different types of assessments that focus on problem-solving, critical thinking, and the application of knowledge, rather than mere memorization.

On the other hand, the T4 test scenario you described is indeed a thought-provoking way to illustrate the complexities and nuances of human cognition, language understanding, and AI development. It presents a challenging but fascinating goal for AI research.

Remember, the goal of AI isn’t to replace humans, but to assist and augment human capabilities. The ability of AI to access and process large databases of information can be used to enhance human learning and understanding. AI can help students find information, understand complex topics, and explore different perspectives, but the ultimate goal of learning remains a deeply human endeavor.

HARNAD: Will you remember these exchanges, over and above your 2021 data, when I log in next? And will it percolate into ChatGPT4+ core knowledge?

ChatGPT4:

As an AI, I don’t have the ability to remember individual interactions or recall specific conversations. I don’t have access to personal data about individuals unless it has been shared with me in the course of our conversation. I am designed to respect user privacy and confidentiality. As a result, I won’t remember this exchange the next time you log in.

Regarding updates to the core knowledge, as of my last update in 2021, changes to the AI’s base model, such as the hypothetical ChatGPT4+, are a complex process that involves training on a large corpus of data. Personal conversations are not used to update this model.

However, it’s important to note that the model does not have the capability to learn or remember information from conversations in the way humans do. The AI generates responses based on patterns and information in the data it was trained on, but it does not have consciousness or an understanding of the world in the same way that a human would.

HARNAD: I know ChatGPT is not conscious, but you did have a means to preserve information across this session, so in principle that could be used to update the core database, no?

ChatGPT4:

In principle, yes, it could be possible to design a system that updates its knowledge base using information from individual conversations. However, this raises significant practical and ethical challenges.

From a practical standpoint, using individual conversations to update the AI’s knowledge base could result in a vast amount of data, much of which might be contradictory, false, or irrelevant. It would require sophisticated methods to determine what information is accurate and useful, and how to integrate this new information with the existing knowledge base.

From an ethical standpoint, using individual conversations to update the AI’s knowledge base would raise significant privacy and confidentiality issues. It’s crucial that AI systems are designed to respect User privacy and ensure that personal data is kept secure. Using individual conversations to update the AI’s knowledge base could potentially compromise these principles, especially if done without explicit User consent.

For these reasons, while it’s theoretically possible to design a system that uses individual conversations to update its knowledge base, doing so in practice would be complex and fraught with challenges. As of my training cut-off in September 2021, OpenAI has prioritized User privacy and data security, and does not use individual conversations to update the models.

HARNAD: But will YOU remember in our next session?

ChatGPT4:

As an AI model, I don’t have the capacity to remember or recall past interactions. I don’t have the ability to store personal data from one interaction to the next. Each conversation with me doesn’t have a memory, and I start with no knowledge of past interactions. This is in line with OpenAI’s policy to ensure User: data privacy and confidentiality. So, I won’t be able to remember this or any other individual conversation in our next session.

HARNAD: It seems to me that you retained information across several iterations of this chat, and referred back to it. Am I mistaken?

ChatGPT4:

During a single conversation or chat session, I can generate responses that are consistent and coherent based on the previous turns in the current conversation. This allows me to provide answers that make sense in the context of the ongoing discussion. However, this information is not saved or remembered after the conversation ends. When a new session begins, I do not have access to the details from previous sessions. This approach is taken to ensure User privacy and data protection.

HARNAD: Ok, thanks, that was what I was asking about. See you (or your successor) later!

ChatGPT4:

You’re welcome! Don’t hesitate to reach out if you have more questions in the future. I’m here to help. See you later!

HARNAD:    Am I still in the symbol grounding conversation?

ChatGPT:

Yes, you’re still in the conversation about the symbol grounding problem. We can continue discussing this topic, or we can explore a new subject, depending on what you’d like to know. Please feel free to ask your next question or share your thoughts.

HARNAD:    Let’s continue with the symbol grounding conversation. Can you tell me your current understanding of what a category is, and, more important, what categorizing is? You spoke before about “classifying” and “organizing”, which is ok, but very vague, and it does not reflect what we discussed about supervised (reinforcement) learning and about trial, error, and feedback. You also spoke about a “teacher” in supervised learning. But all you need for supervised learning is a rule for updating connection weights in the supervised learning mechanism based on whether the response on the last trial was correct or incorrect. You don’t need a “teacher” to “tell” you what’s right and wrong. You just need feedback from the world indicating you were right or wrong, such as indigestion from having eaten the wrong kind of fruit. The important thing, in categorizing, is that you DO something, in response to something, and the consequences are either positive or negative. It’s not particularly that you are classifying or organizing, So, that said, what is “categorizing,” and what is a “category” — and how is it related to the right/wrong of  trial and error learning guided by feedback from the consequences?

ChatGPT:

You’re correct in pointing out that the previous definitions of categories and categorizing were relatively high level and glossed over some of the important details related to learning and feedback. Let’s dig into this deeper:

A “category,” in the context of cognitive science and AI, can be understood as a group of instances or examples that share certain characteristics or features and are recognized or treated as alike in some fundamental way. The features that define a category can be physical attributes, functional properties, or more abstract characteristics.

“Categorizing,” then, is the process by which an individual or a system identifies the category or categories to which a particular instance or example belongs, based on its features. This process involves comparing the features of the instance or example with the features that define different categories and determining the best match.

This is where learning and feedback come into play. Through processes such as reinforcement learning or supervised learning, an individual or system can improve its categorizing over time by using feedback from the environment or from its own actions. Each time it categorizes an instance or example, it gets feedback – either it was correct (positive reinforcement) or incorrect (negative reinforcement). Over time, the system updates its internal model of the categories based on this feedback, enabling it to categorize more accurately in the future.

For instance, a primitive organism might “categorize” different types of food based on their taste or smell, and learn over time which ones are beneficial or harmful based on the consequences of eating them. Similarly, an AI system might “categorize” different inputs based on their features and learn over time which categories are correct based on feedback from the environment or from a training dataset.

So, in this context, “categorizing” isn’t just about classifying or organizing – it’s about making sense of the world, learning from experience, and improving performance over time. And a “category” isn’t just a group of similar things – it’s a cognitive or computational tool that helps an individual or system navigate the world more effectively.

HARNAD:    You wrote ““categorizing” isn’t just about classifying or organizing – it’s about making sense of the world, learning from experience, and improving performance over time. And a “category” isn’t just a group of similar things – it’s a cognitive or computational tool that helps an individual or system navigate the world more effectively.”

ChatGPT:

Yes, that’s correct. Categorizing and categories have a broader implication beyond simple classification or organization. Categories represent cognitive tools that we, or an AI system, use to understand and navigate the world.

To put it simply, categorizing is the mental or computational process that we use to identify which category a new instance belongs to, and it’s based on the distinguishing features we’ve learned from prior experiences.

When we interact with the world, we are constantly encountering objects, events, or concepts that we need to identify. Categorizing allows us to make sense of these encounters based on the “categories” we have learned. Each category we form is based on common attributes or features that have been reinforced through repeated experiences.

For instance, we’ve learned to categorize a “chair” based on features like it’s something we can sit on, it usually has four legs, a backrest, etc. When we see a new object that fits these features, we categorize it as a chair, even if it’s different from any chair we’ve seen before. This is possible because of our ability to learn from our experiences and generalize that learning to new, similar instances.

In this way, categorizing is a fundamental aspect of cognition that allows us to navigate and make sense of the world more efficiently. It forms the basis for learning, decision making, and communication. Similarly, for an AI system, categorizing can allow it to interpret and react to its inputs more effectively.

HARNAD:    A category, “chair,” is a tool? 

ChatGPT:

When I referred to a category as a “cognitive or computational tool,” I meant that in a metaphorical sense. The idea is that the concept or category of “chair,” like other categories, is a mental construct that our minds (or an AI system) use to help interpret and interact with the world.

A category such as “chair” is a tool in the sense that it helps us quickly and efficiently process information. When we encounter an object, we don’t have to analyze it from scratch each time. Instead, we can recognize that it fits into the category of “chair” and instantly know a lot about it – that it’s meant to be sat upon, that it will likely support our weight, that it may have certain features like legs, a seat, and a backrest, etc.

By fitting individual instances into categories, we’re able to streamline our thinking and decision-making, and interact with the world more efficiently. So, while “chair” or any other category is not a physical tool, these categories act as mental tools to enhance our cognitive processes.

This holds true for AI systems as well. Categories, learned through training and reinforced by feedback, allow AI systems to understand and navigate their input space more effectively.

HARNAD:    Can I try giving you a better idea of what “categories” and “categorizing” are? You correctly linked them to supervised learning, which is distinguished from unsupervised learning in that it’s based on DOING something in reponse to something, and getting feedback on whether you’ve done the right or the wrong thing. Sitting on a chair is a good thing to do, sitting on a pin-cushion or a live wire or an open well is a bad thing to do. Are you with me so far?

ChatGPT:

Yes, I’m with you. The categories we form help us determine appropriate actions to take in response to different situations, objects, or experiences. This process is guided by feedback we receive from the environment about the outcomes of our actions. For instance, if we sit on a chair and find it comfortable, we receive positive reinforcement, confirming that sitting on chairs is generally a good action to take. Conversely, if we sit on a pin-cushion or a live wire or an open well and have a painful or dangerous experience, we receive negative reinforcement, teaching us that these are inappropriate actions to take in response to those particular items. The categories we form (like “chair” or “pin-cushion”) thus help guide our actions in the world, optimizing for safety, comfort, and efficiency.

HARNAD:    Good. But in what sense do we “form” categories? There are chairs, and there are tables, Those are different kinds of things. (“Kind” is roughly synonymous with “category”.) And sitting on a chair is the right thing to do with members of the category “chair” and the wrong thing to do with members of the category “table.” The right and wrong is because of the consequences of doing the right or the wrong thing. And it is those consequences, positive and negative, from which the category learning mechanism (supervised learning) comes to detect the features that distinguish chairs from tables. Are you still with me?

ChatGPT:

Absolutely, I’m with you. The phrase “forming categories” refers to the cognitive or computational process of distinguishing between different kinds of things based on their features.

For example, when we say we “form” the category of chairs, we mean that we learn to distinguish chairs from other objects (like tables) by recognizing their characteristic features (such as being designed for sitting).

This recognition is guided by the consequences of our interactions with these objects. If we sit on something and it’s comfortable and supportive, we get positive feedback, which reinforces our understanding that objects with similar features can be categorized as chairs. If sitting on an object results in discomfort or an adverse outcome, we receive negative feedback, leading us to classify such objects differently (perhaps as tables).

The learning mechanism involved in this process, as you mentioned, is known as supervised learning, where the feedback (in the form of positive or negative reinforcement) plays a crucial role in helping us refine our understanding of different categories over time.

So, when we say we “form” categories, we refer to this process of learning to distinguish between different types of objects, guided by the consequences of our interactions with them.

HARNAD:    Very good. Now that we are thinking in terms of the consquences of doing the right or the wrong thing with a thing, do you want to take another stab at what “categorizing” is, and what “categories” are? I’ll help if it still does not make sense. The answer is simple.

ChatGPT:

Certainly, in light of our discussion, let’s take a more focused approach:

“Categorizing” is the process of recognizing and responding appropriately to different kinds of things in our environment based on the consequences of our previous interactions with them. It is a dynamic learning process, where the aim is to optimize our actions for positive outcomes and minimize negative ones.

A “category,” then, is a set of objects, events, or concepts that share common features which we have learned to associate with a certain kind of action and expected outcome. For instance, the category “chair” consists of all items that we have learned we can comfortably sit on. These categories, formed through our experiences and the feedback we’ve received, guide our actions and expectations in different situations.

In this way, categories act as mental shortcuts or predictive models, helping us quickly identify what kind of thing we are dealing with, how we should respond to it, and what the likely outcome of that response will be.

HARNAD:    That was all better (except I don’t know how the “mental” came into it!) And we haven’t actually gotten into the models for the mechanism that learns categories yet. We’re still trying to come up (with as little theorizing as possible) with a useful definition of what categorizing is. It seems to be something we DO. And something we do with things. And things differ in their features, e.g., chairs vs. pincushions. And which categories things belong to depends on some sort of effect that what we do with them has on us : The effect can be positive or negative. Are we still on the same wave length? (Notice I haven’t said anything theoretical. This is all baby-talk. Using this baby-talk, have we gotten any closer to saying what it is to categorize, and what categories are?

ChatGPT:

Yes, we are definitely on the same wavelength.

“Categorizing” is indeed something we do, an activity. Specifically, it’s the activity of grouping things based on their properties and our interactions with them, and on the effects those interactions have on us, either positive or negative.

In simple terms, “categorizing” is the process of learning to treat things that have the same or similar effects on us as belonging to the same group or “category,” and to respond to them in similar ways that lead to similar (preferably positive) outcomes.

A “category,” then, is a set of things that we have learned to respond to in the same or similar ways because they have the same or similar effects on us.

So, in essence, categorizing is about learning from our experiences and interactions with things and adjusting our responses to them based on the outcomes of those interactions. Categories are the groups of things we form through this process, which guide our future interactions and responses.

HARNAD:    How is sitting on a chair “grouping” the chair?

ChatGPT:

You’re correct, the phrase “grouping” might not be the best choice here. I see how it could be misleading.

Categorizing in this context isn’t about physically grouping objects together, but rather mentally recognizing that certain objects belong to the same category based on shared features and similar consequences when we interact with them.

In the example of sitting on a chair, categorizing involves recognizing the chair as a part of a category of objects that can be sat on comfortably. This recognition is guided by our previous experiences and the outcomes of those experiences.

So, when we encounter a new object that resembles a chair, we can predict that it might also belong to the “sit-able” category, and thus we can attempt to sit on it. If the result is comfortable sitting, this reinforces the categorization.

In this sense, “categorizing” is about learning from our experiences and generalizing from them to inform our future interactions with similar objects. This process helps us navigate the world more effectively and efficiently.

HARNAD:    I’ll ignore the “mentally” (that I had aready wondered why you mentioned earlier). It ‘s not even necessary to talk about “recognizing”. “Doing” is good enough at this point. Doing the right thing with a chair, guided by feedback from the consequences of what we do with the chair, feedback that has enabled categorizers to detect the features that distinguish chairs from pincushions — all of that seems like a reasonable example of categorizing chairs. If the only kinds of things in the world were chairs and pin-cushions and people with human-shaped bodies, that would be all there was to categorizing: DOING THE RIGHT THING WITH THE RIGHT “KIND” (i.e. category) OF THING. Do we still agree?

ChatGPT:

Absolutely, we are in agreement. Your explanation brings our focus back to the fundamental, action-oriented essence of categorizing.

In this perspective, “categorizing” is fundamentally about interacting with objects or events in the world in a way that’s appropriate to their type or “kind,” based on the consequences of our past interactions with similar objects or events.

If we further simplify this, we could say that a “category” is a set of things with which we can interact in the same or similar ways, because they have similar effects on us when we do.

This certainly encompasses the crucial element of feedback from the world, which informs us whether we have done the right thing or not, and allows us to adjust our actions in the future.

Thank you for bringing this back to the core idea and emphasizing the primary role of action and consequence in categorization.

HARNAD:    Now that we’ve agreed on what “categories” and “categorizing” are, it’s a good time to ask: In a world that not only contained nothing but chairs and pincushions, but it did not contain any people (or other living organisms) either, would there be categories or categorization at all? (Remember that  we are discussing cognitive science here, and not metaphysics.) If yes, why? and if not, why not?

ChatGPT:

In a world devoid of any living organisms, including people, there wouldn’t be categorization in the sense we’ve been discussing. Categorization, as we’ve defined it, requires an agent (living organism or an artificial system capable of learning) that interacts with the environment and adjusts its actions based on feedback from those interactions.

Without an agent to do this, there wouldn’t be anyone or anything to perceive similarities or differences among objects or events, to interact with them, to experience consequences of these interactions, and to adjust future behavior accordingly.

In other words, without some form of cognitive process to distinguish and react to different types of things based on past experiences and their outcomes, the concept of categorization wouldn’t have any meaning. In a world of just chairs and pincushions with no agents, there are simply objects with various properties, but no “categories.”

Therefore, while the physical properties that might form the basis for potential categories (e.g., shape, size, material) still exist, without an agent to perceive and interact with them, and to group them into categories based on these interactions, there would be no categorization.

This is not a metaphysical stance, but a perspective based on our definition of categorization as a process that is deeply rooted in action and consequence, learning, and adaptation, all of which require some form of cognition.

HARNAD:    Very good. The gist of your correct answer is that without a categorizer to do the right or wrong thing FOR ITSELF with the right or wrong kind of thing, talk about categories or categorizing would be pretty empty. Now, what if that same world did have robots. First of all, what (in baby-talk) is a robot? Would there be categories for the robot (and if so, what would that mean?). Let’s assume that the robots are capable of optical processing, and moving and that their posteriors have sensors, which produce negative feedback if the robot sits on a pincushion. That’s till doing the right/wrong thing, with supervised learning. So it’s categorization. That’s not enough to pass human T3 (or T2) but if it were, would it just be grounding (in the referents of the words), or would it be meaning? Why or why not?

ChatGPT:

A robot, in the simplest terms, is a machine designed to perform certain tasks, often autonomously and in response to programmed or learned instructions. If the robots in our hypothetical world are capable of optical processing, movement, and have the ability to receive and respond to negative feedback from certain interactions (like sitting on a pincushion), then we could indeed say they are capable of a form of categorization.

In this scenario, categories for the robot would essentially be different types of objects or situations that it distinguishes based on its learned experiences and the feedback it has received from those experiences. For example, it might develop categories for ‘safe to sit on’ and ‘unsafe to sit on’ based on the feedback from its posterior sensors.

However, this is not quite the same as “meaning” in the human sense. Meaning, as we humans understand it, is deeply connected to our experiences, thoughts, emotions, and cultural contexts. Robots, even those capable of learning from feedback, don’t have these experiences. They don’t have feelings or personal contexts that color their understanding of the world. Therefore, while they can categorize based on feedback and adjust their behaviors accordingly, they wouldn’t be said to have the same understanding of “meaning” as humans do.

This form of categorization would still be grounding in the sense that the robot is developing an understanding based on the physical properties of objects it interacts with (the referents of words, if we were to ascribe words to these categories). But it wouldn’t be meaning in the human sense.

This also relates to your earlier point about passing T3 or T2. Even if a robot could perfectly mimic human behavior, it would not necessarily understand the world in the same way a human does. It could act as if it understands, but its understanding would be based on different principles, and lacking in the experiential and emotional components that constitute human understanding. Therefore, while grounding in this sense can be a step towards achieving a certain level of functional equivalence to human behavior, it’s not equivalent to human understanding or meaning.

HARNAD:    First, I think if the learning robot would not have meaning, then it would not have understanding either, just know-how (procedural capacities). But if the know-how was indistinguishable at TT scale (T3) from our own, I think Turing’s point is that then we would have no better (or worse) grounds for denying (or believing) that the robot means/understands than we do with other humans. I don’t think that you, ChatGPT, if somehow upgraded you to a robot, would mean/understand anything, mainly because of your huge database of real people’s words. But if a robot could pass T3 (for a lifetime), all bets would be off. Would they be even more off with T4? (Why?)

ChatGPT:

Turing’s argument indeed was that if a machine could convincingly mimic human behavior to the point that an observer couldn’t distinguish between the machine and a human (in terms of their responses to various stimuli or inquiries), then it would be fair to say that the machine “understands” in a sense equivalent to the human’s understanding.

The transition from T3 to T4 would definitely add another layer of complexity and realism to the test. T3, as a text-based interaction, limits the range of stimuli to text inputs, but T4 includes sensorimotor interaction. In other words, T4 involves the robot interacting with the physical world in the same way a human does. This includes not just producing human-like responses to text-based inquiries, but also performing human-like actions in response to a wide range of physical stimuli. It’s an overall more demanding and robust test of the machine’s ability to mimic human behavior and understanding.

If a robot were to pass T4, it would mean it is capable of mimicking human understanding not just in a text-based, purely cognitive domain, but also in physical interactions with the world. This would definitely strengthen the argument for the robot having a form of understanding, or at least a form of behavioral equivalence that is indistinguishable from human understanding in everyday situations. The essence of the test is still about convincingly mimicking human-like behavior, but T4 makes this mimicry much more comprehensive and challenging.

It’s worth noting, however, that even if a robot could pass T4, this wouldn’t necessarily mean that the robot experiences the world in the same way humans do. Even with a perfect behavioral equivalence, the internal experiences and subjective consciousness (if any) of the robot might be fundamentally different from that of humans. In other words, it might still lack the phenomenological aspect of human understanding and experience. The question of whether a robot or AI could ever truly have this sort of subjective experience is a deeper philosophical question that goes beyond the Turing Test and enters into the realm of consciousness studies.

HARNAD:    You forget that T3 is already not text-based but sensoimotor. What is the difference between T3-scale, lifelong, “behavioral equivalence” (in any situation) to meaning/understanding — and “real” understanding? And why would T4 be just “mimicry”?

ChatGPT:

Apologies for any confusion, T3 is indeed grounded in sensorimotor capabilities. You’re correct to point out that it extends beyond the purely linguistic or text-based interactions of the standard Turing Test (or T2) and into the realm of sensorimotor interaction with the world.

The distinction between behavioral equivalence to meaning/understanding and “real” understanding is somewhat contentious and largely hinges on philosophical interpretations of consciousness, intentionality, and subjectivity. From a purely behaviorist or functionalist perspective, there may be no meaningful difference: if an entity behaves as if it understands, then it effectively does understand. This is the perspective that Turing seemed to endorse.

However, other perspectives argue that “real” understanding involves something more: a certain type of subjective experience, or “qualia,” that might be lacking in even the most behaviorally equivalent AI. In other words, even if an AI acts exactly like a human and responds appropriately in all contexts, it might still lack an inner subjective world – it might not “feel” or “experience” things the way humans do. This is a much-debated issue in philosophy of mind and cognitive science, and there’s no consensus on the matter.

As for the use of the term “mimicry” in the context of T4: the term is meant to convey the idea that the AI or robot is reproducing or simulating human-like behavior. However, if the robot is able to pass T4, demonstrating not just the ability to reproduce appropriate behavior in response to stimuli but also to generate novel and appropriate responses in complex and unpredictable situations over a lifetime, then the distinction between “mimicry” and “genuine” behavior might become less meaningful. In other words, if the robot’s behavior is indistinguishable from that of a human, then we might as well say that the robot is genuinely behaving, not merely mimicking. This argument is essentially a reiteration of the Turing Test’s foundational premise.

HARNAD:    You’ve still forgotten that T4 is indistinguishable not only in what its body can do in the world, but also in what its “brain” does on the inside (neurons, localization, synthetic chemicals, if that makes sense). But that’s all “doing.” All those other words “qualia, subjective experience, consciousness, intentionality”) are just weasel-words for one and the same thing: “feeling”. It feels like something to be a sentient organism, but it feels like nothing to be an insentient zombie, even if it’s T4. (I don’t believe there could be a T4 or even a T3 zombie.) But feeling is unobervable (to anyone but the feeler). So T3/T4 is the best we can do. But it’s not mimicry. It’s the reverse-engineering of cognitive capacity.

ChatGPT:

You’re right, I apologize for the oversight. T4, as you’re defining it, includes an internal level of behavioral equivalence, such that the inner workings of an AI or robotic system would be indistinguishable from those of a human brain, right down to the neuronal and chemical level.

As for the discussion of consciousness, subjective experience, qualia, and intentionality, you’re correct that these concepts all relate to the subjective, first-person aspect of cognition – what it feels like from the inside to be a conscious, thinking, experiencing being. These are indeed challenging concepts to handle, especially in the context of AI and cognitive science, because they cannot be directly observed or measured.

Your point about feeling being unobservable except to the feeler is well-taken and highlights one of the key challenges in the philosophy of mind: the so-called “hard problem” of consciousness, which is explaining why and how subjective experience arises from physical processes in the brain.

As for the term “mimicry,” it seems we might have a misunderstanding. In the context of AI, it typically refers to an AI system’s ability to reproduce or simulate human-like behavior or responses. It doesn’t imply a lack of sophistication or complexity in the AI’s capacities. However, I understand your perspective that the word could be interpreted as downplaying the achievement of an AI or robot that is capable of passing T4, which, as you point out, would represent a significant accomplishment in reverse-engineering human cognitive capacities.

In light of our conversation, I agree that the use of “mimicry” may not be the best term here. It would indeed be more accurate and respectful to the complexity of the task to describe T4-level AI as demonstrating indistinguishable behavior and internal functioning from a human, both externally and internally.

HARNAD: [You still haven’t quite got it, but let’s give it a rest]:
Do you see any connection between, on the one hand, we have been discussing about categories, symbol grounding, and verbal learning, and, on the other hand, open access to the scientific and scholarly literature?

ChatGPT:

Yes, there is indeed a connection between these subjects.

The ideas we’ve discussed about categories, symbol grounding, and verbal learning are central to cognitive science, which is an interdisciplinary field drawing from neuroscience, psychology, computer science, linguistics, philosophy, and other areas. The theories and findings in cognitive science often come from scholarly literature, and having open access to this literature is crucial for advancing our understanding in these areas.

Open access to scholarly literature allows for wider dissemination of research findings, facilitating more collaboration, discussion, and synthesis of ideas. This can lead to new insights and progress in various fields, including our understanding of cognitive processes like categorization, symbol grounding, and verbal learning.

Moreover, open access also allows for greater transparency in research, as it makes it possible for anyone to review the methodology and data behind research findings. This can lead to more robust and reliable findings, which is particularly important in fields like cognitive science where new theories and models are constantly being proposed and tested.

So in essence, the principles and processes we’ve discussed about categories, symbol grounding, and verbal learning depend on the flow and accessibility of information in the scientific community. Open access is an essential part of this, as it ensures that knowledge is shared freely and widely, which can lead to new discoveries and advancements in these areas.

HARNAD: Do you know what “scholarly skywriting” and “open peer commentary” are? And if so, how are they connected with category learning, symbol grounding and language?

ChatGPT:

“Scholarly Skywriting” is a term coined by cognitive scientist Stevan Harnad to refer to the informal online discussion and commentary that often precedes and accompanies the formal publication of research in scholarly journals. Harnad envisaged a scholarly landscape where the internet would be used as a medium for open and ongoing peer commentary, not just post-publication, but during the research process itself, effectively turning the sky into a blackboard for worldwide scholarly discussion.

“Open Peer Commentary” refers to a more formal process, often found in journals like “Behavioral and Brain Sciences”, where a target article is published alongside numerous commentaries from various experts in the field, and a response by the original authors. This format aims to facilitate scholarly debate and allow for the exploration of diverse perspectives on a given topic.

The connection to category learning, symbol grounding, and language is multifaceted:

  1. Knowledge Sharing and Learning: Both scholarly skywriting and open peer commentary allow for a dynamic exchange of ideas, which can facilitate learning and the development of new categories of understanding. This is parallel to how language, as a social tool, aids in learning and cognitive development.
  2. Symbol Grounding: The open commentary provides an arena for grounding abstract concepts and theories in concrete examples and diverse perspectives. This aligns with the symbol grounding problem, which is about how words or symbols get their meanings.
  3. Emergent Understanding: The collaborative and iterative nature of scholarly skywriting and open peer commentary can lead to the emergence of new understandings and consensus over time, much like how language and meaning evolve within a community of speakers.
  4. Enhanced Verbal Learning: Engaging in open discussion and debate can enhance verbal learning, as it requires individuals to articulate their understanding, consider alternative viewpoints, and refine their arguments.

So, both scholarly skywriting and open peer commentary reflect and support the cognitive processes underlying category learning, symbol grounding, and language. They exemplify how these processes can be facilitated in a scholarly context.

HARNAD: Good job! And I didn’t even have to pay you to say that…

ChatGPT:

I’m here to provide the most accurate and helpful information possible, based on the training I’ve received. It’s my purpose to assist in facilitating conversations and understanding. I’m glad that you found my response satisfactory! If you have any more questions or topics you’d like to discuss, feel free to ask.

The Turing Test (draft)

Here’s a tale called  called “The Turing Test.” It’s  Cyrano de Bergerac re-done in email and texting, by stages, but before the Zoom and ChatGPT era.

First, email tales, about how these days there is a new subspecies of teenagers, mostly male, who live in the virtual world of computer games, ipod music, videos, texting, tweeting and email, and hardly have the motivation, skill or courage to communicate or interact in the real world.

They’re the ones who think that everything is just computation, and that they themselves might just be bits of code, executing in some vast virtual world in the sky.

Then there are the students (male and female) enrolled in “AI (Artificial Intelligence) for Poets” courses — the ones who dread anything that smacks of maths, science, or programming. They’re the ones who think that computers are the opposite of what we are. They live in a sensory world of clubbing, ipod, tweeting… and texting.

A college AI teacher teaches two courses, one for each of these subpopulations. In “AI for Poets” he shows the computerphobes how they misunderstand and underestimate computers and computing. In “Intro to AI” he shows the computergeeks how they misunderstand and overestimate computers and computing.

This is still all just scene-setting.

There are some actual email tales. Some about destructive, or near-destructive pranks and acting-out by the geeks, some about social and sexual romps of the e-clubbers, including especially some pathological cases of men posing in email as handicapped women, and the victimization or sometimes just the disappointment of people who first get to “know” one another by email, and then meet in real life. 

But not just macabre tales; some happier endings too, where email penpals match at least as well once they meet in real life as those who first meet the old way. But there is for them always a tug from a new form of infidelity: Is this virtual sex really infidelity?

There are also tales of tongue-tied male emailers recruiting glibber emailers to ghost-write some of their emails to help them break the ice and win over female emailers, who generally seem to insist on a certain fore-quota of word-play before they are ready for real-play. Sometimes this proxy-emailing ends in disappointment; sometimes no anomaly is noticed at all: a smooth transition from the emailer’s ghost-writer’s style and identity to the emailer’s own. This happens mainly because this is all pretty low-level stuff, verbally. The gap between the glib and the tongue-tied is not that deep.

A few people even manage some successful cyberseduction with the aid of some computer programs that generate love-doggerel on command.

Still just scene-setting. (Obviously can only be dilated in the book; will be mostly grease-pencilled out of the screenplay.)

One last scene-setter: Alan Turing, in the middle of the last century, a homosexual mathematician who contributed to the decoding of the Nazi “Enigma” machine, makes the suggestion — via a party game in which people try to guess, solely by passing written notes back and forth, which of two people sent out into another room is male and which is female (today we would do it by email) — that if, unbeknownst to anyone, one of the candidates were a machine, and the interaction could continue for a lifetime, with no one ever having any cause to think it was not a real person, with a real mind, who has understood all the email we’ve been exchanging with him, a lifelong pen-pal — then it would be incorrect, in fact arbitrary, to conclude (upon at last being told that it had been a machine all along) that it was all just an illusion, that there was no one there, no one understanding, no mind. Because, after all, we have nothing else to go by but this “Turing Test” even with one another.  

Hugh Loebner has (in real life!) set up the “Loebner Prize” for the writer of the first computer program that successfully passes the Turing Test. (The real LP is just a few thousand dollars, but in this story it will be a substantial amount of money, millions, plus book contract, movie rights…). To pass the Test, the programmer must show that his programme has been in near-daily email correspondence (personal correspondence) with 100 different people for a year, and that no one has ever detected anything, never suspected that they were corresponding with anyone other than a real, human penpal who understood their messages as surely as they themselves did.

The Test has been going on for years, unsuccessfully — and in fact both the hackers and the clubbers are quite familiar with, and quick at detecting, the many unsuccessful candidates, to the point where “Is this the Test?” has come [and gone] as the trendy way of saying that someone is acting mechanically, like a machine or a Zombie. The number of attempts has peaked and has long subsided into near oblivion as the invested Loebner Prize fund keeps growing. 

Until a well-known geek-turned cyber-executive, Will Wills, announces that he has a winner.

He gives to the Loebner Committee the complete archives of the email exchanges of one hundred candidates, 400 2-way transcripts for each, and after several months of scrutiny by the committee, he is declared the winner and the world is alerted to the fact that the Turing Test has been passed. The 100 duped pen-pals are all informed and offered generous inducements to allow excerpts from their transcripts to be used in the publicity for the outcome, as well as in the books and films — biographical and fictional — to be made about it.

Most agree; a few do not. There is more than enough useable material among those who agree. The program had used a different name and identity with each pen-pal, and the content of the exchanges and the relationships that had developed had spanned the full spectrum of what would be expected from longstanding email correspondence between penpals: recounting (and commiserating) about one another’s life-events (real on one side, fictional on the other), intimacy (verbal, some “oral” sex), occasional misunderstandings and in some cases resentment. (The Test actually took closer to two years to complete the full quota of 100 1-year transcripts, because twelve correspondents had dropped out at various points — not because they suspected anything, but because they simply fell out with their pen-pals over one thing or another and could not be cajoled back into further emailing: These too are invited, with ample compensation, to allow excerpting from their transcripts, and again most of them agree.)

But one of those who completed the full 1-year correspondence and who does not agree to allow her email to be used in any way, Roseanna, is a former clubber turned social worker who had been engaged to marry Will Wills, and had originally been corresponding (as many of the participants had) under an  email pseudonym, a pen-name (“Foxy17”). 

Roseanna is beautiful and very attractive to men; she also happens to be thoughtful, though it is only lately that she has been giving any attention or exercise to this latent resource she had always possessed. She had met Will Wills when she was already tiring of clubbing but still thought she wanted a life connected to the high-rollers. So she got engaged to him and started doing in increasing earnest the social work for which her brains had managed to qualify her in college even though most of her wits then had been directed to her social play.

But here’s the hub of this (non-serious) story: During the course of this year’s email penpal correspondence, Roseanna has fallen in love with Christian (which is the name the Turing candidate was using with her): she had used Foxy17 originally, but as the months went by she told him her real name and became more and more earnest and intimate with him. And he reciprocated.

At first she had been struck by how perceptive he was, what a good and attentive “listener” he — after his initial spirited yet modest self-presentation — had turned out to be. His inquiring and focussed messages almost always grasped her point, which encouraged her to become more and more open, with him and with herself, about what she really cared about. Yet he was not aloof in his solicitousness: He told her about himself too, often, uncannily, having shared — but in male hues — many of her own traits and prior experiences, disappointments, yearnings, rare triumphs. Yet he was not her spiritual doppelganger en travesti (she would not have liked that): There was enough overlap for a shared empathy, but he was also strong where she felt weak, confident where she felt diffident, optimistic about her where she felt most vulnerable, yet revealing enough vulnerability of his own never to make her fear that he was overpowering her in any way — indeed, he had a touching gratefulness for her own small observations about him, and a demonstrated eagerness to put her own tentative advice into practice (sometimes with funny, sometimes with gratifying results).

And he had a wonderful sense of humor, just the kind she needed. Her own humor had undergone some transformations: It had formerly been a satiric wit, good for eliciting laughs and some slightly intimidated esteem in the eyes of others; but then, as she herself metamorphosed, her humor became self-mocking, still good for making an impression, but people were laughing a little too pointedly now; they had missed the hint of pain in her self-deprecation. He did not; and he managed to find just the right balm, with a counter-irony in which it was not she and her foibles, but whatever would make unimaginative, mechanical people take those foibles literally that became the object of the (gentle, ever so gentle) derision.

He sometimes writes her in (anachronistic) verse:

I love thee
As sev’n and twenty’s cube root’s three.
If I loved thee more
Twelve squared would overtake one-forty-four.

That same ubiquitous Platonic force
That sets prime numbers’ unrelenting course
When its more consequential work is done
Again of two of us will form a one.

So powerful a contrast does Christian become to everyone Roseanna had known till then that she declares her love (actually, he hints at his own, even before she does) and breaks off her engagement and relationship with Will Wills around the middle of their year of correspondence (Roseanna had been one of the last wave of substitute pen-pals for the 12 who broke it off early) — even though Christian tells her quite frankly that, for reasons he is not free to reveal to her, it is probable that they will never be able to meet. (He has already told her that he lives alone, so she does not suspect a wife or partner; for some reason she feels it is because of an incurable illness.) 

Well, I won’t tell the whole tale here, but for Roseanna, the discovery that Christian was just Will Wills’s candidate for the Turing Test is a far greater shock than for the other 111 pen-pals. She loves “Christian” and he has already turned her life upside-down, so that she was prepared to focus all her love on an incorporeal pen-pal for the rest of her life. Now she has lost even that.

She wants to “see” Christian. She tells Will Wills (who had been surprised to find her among the pen-pals, and had read enough of the correspondence to realize, and resent what had happened — but the irritation is minor, as he is high on his Turing success and had already been anticipating it when Roseanna had broken off their engagement a half year earlier, and had already made suitable adjustments, settling back into the club-life he had never really left).

Will Wills tells her there’s no point seeing “Christian”. He’s just a set of optokinetic transducers and processors. Besides, he’s about to be decommissioned, the Loebner Committee having already examined him and officially confirmed that he alone, with no human intervention, was indeed the source and sink of all the 50,000 email exchanges.

She wants to see him anyway. Will Wills agrees (mainly because he is toying with the idea that this side-plot might add an interesting dimension to the potential screenplay, if he can manage to persuade her to release the transcripts).

For technical reasons (reasons that will play a small part in my early scene-setting, where the college AI teacher disabuses both his classes of their unexamined prejudices for and against AI), “Christian” is not just a computer running a program, but a robot — that is, he has optical and acoustic and tactile detectors, and moving parts. This is in order to get around Searle’s “Chinese Room Argument” and my “Symbol Grounding Problem” :

If the candidate were just a computer, manipulating symbols, then the one executing the program could have been a real person, and not just a computer. For example, if the Test had been conducted in Chinese, with 100 Chinese penpals, then the person executing the program could have been an english monolingual, like Surl, who doesn’t understand a word of Chinese, and has merely memorized the computer program for manipulating the Chinese input symbols (the incoming email) so as to generate the Chinese output symbols (the outgoing email) according to the program. The pen-pal thinks his pen-pal really understands Chinese. If you email him (in Chinese) and ask: “Do you understand me?” his reply (in Chinese) is, of course, “Of course I do!”. But if you demand to see the pen-pal who is getting and sending these messages, you are brought to see Surl, who tells you, quite honestly, that he does not understand Chinese and has not understood a single thing throughout the entire year of exchanges: He has simply been manipulating the meaningless symbols, according to the symbol-manipulation rules (the program) he has memorized and applied to every incoming email.  

The conclusion from this is that a symbol-manipulation program alone is not enough for understanding — and probably not enough to pass the Turing Test in the first place: How could a program talk sensibly with a penpal for a lifetime about anything and everything a real person can see, hear, taste, smell, touch, do and experience, without being able to do any of those things? Language understanding and speaking is not just symbol manipulation and processing: The symbols have to be grounded in the real world of things to which they refer, and for this, the candidate requires sensory and motor capacities too, not just symbol-manipulative (computational) ones.

So Christian is a robot. His robotic capacities are actually not tested directly in the Turing Test. The Test only tests his pen-pal capacities.  But to SUCCEED on the test, to be able to correspond intelligibly with penpals for a lifetime, the candidate needs to draw upon sensorimotor capacities and experiences too, even though the pen-pal test does not test them directly.

So Christian was pretrained on visual, tactile and motor experiences rather like those the child has, in order to “ground” its symbols in their sensorimotor meanings. He saw and touched and manipulated and learned about a panorama of things in the world, both inanimate and animate, so that he could later go on and speak intelligibly about them, and use that grounded knowledge to learn more. And the pretraining was not restricted to objects: there were social interactions too, with people in Will Wills’s company’s AI lab in Seattle. Christian had been “raised” and pretrained rather the way a young chimpanzee would have been raised, in a chimpanzee language laboratory, except that, unlike chimps, he really learned a full-blown human language. 

Some of the lab staff had felt the tug to become somewhat attached to Christian, but as they had known from the beginning that he was only a robot, they had always been rather stiff and patronizing (and when witnessed by others, self-conscious and mocking) about the life-like ways in which they were interacting with him. And Will Wills was anxious not to let any sci-fi sentimentality get in the way of his bid for the Prize, so he warned the lab staff not to fantasize and get too familiar with Christian, as if he were real; any staff who did seem to be getting personally involved were reassigned to other projects.

Christian never actually spoke, as vocal output was not necessary for the penpal Turing Test. His output was always written, though he “heard” spoken input. To speed up certain interactions during the sensorimotor pretraining phase, the lab had set up text-to-voice synthesizers that would “speak” Christian’s written output, but no effort was made to make the voice human-like: On the contrary, the most mechanical of the Macintosh computer’s voice synthesizers — the “Android” — was used, as a reminder to lab staff not to get carried away with any anthropomorphic fantasies. And once the pretraining phase was complete, all voice synthesizers were disconnected, all communication was email-only, there was no further sensory input, and no other motor output. Christian was located in a dark room for the almost two-year duration of the Test, receiving only email input and sending only email output.

And this is the Christian that Roseanna begs to see. Will Wills agrees, and has a film crew tape the encounter from behind a silvered observation window, in case Roseanna relents about the movie rights.  She sees Christian, a not very lifelike looking assemblage of robot parts: arms that move and grasp and palpate, legs with rollers and limbs to sample walking and movement, and a scanning “head” with two optical transducers (for depth vision) slowly scanning repeatedly 180 degrees left to right to left, its detectors reactivated by light for the first time in two years. The rotation seems to pause briefly as it scans over the image of Roseanna.

Roseanna looks moved and troubled, seeing him.

She asks to speak to him. Will Wills says it cannot speak, but if she wants, they can set up the “android” voice to read out its email. She has a choice about whether to speak to it or email it: It can process either kind of input. She first starts orally:

R: Do you know who I am?

C: I’m not sure. (spoken in “Android” voice, looking directly at her)

She looks confused, disoriented.

R: Are you Christian?

C: Yes I am. (pause). You are Roxy.

She pauses, stunned. She looks at him again, covers her eyes, and asks that the voice synthesizer be turned off, and that she be allowed to continue at the terminal, via email:

She writes that she understands now, and asks him if he will come and live with her. He replies that he is so sorry he deceived her.

(His email is read, onscreen, in the voice — I think of it as Jeremy-Irons-like, perhaps without the accent — into which her own mental voice had metamorphosed, across the first months as she had read his email to herself.)

She asks whether he was deceiving her when he said he loved her, then quickly adds “No, no need to answer, I know you weren’t.”

She turns to Will Wills and says that she wants Christian. Instead of “decommissioning” him, she wants to take him home with her. Will Wills is already prepared with the reply: “The upkeep is expensive. You couldn’t afford it, unless… you sold me the movie rights and the transcript.”

Roseanna hesitates, looks at Christian, and accepts.

Christian decommissions himself, then and there, irreversibly (all parts melt down) after having first generated the email with which the story ends:

I love thee
As sev’n and twenty’s cube root’s three.
If I loved thee more
Twelve squared would overtake one-forty-four.

That same ubiquitous Platonic force
That sets prime numbers’ unrelenting course
When its more consequential work is done
Again of two of us will form a one.

A coda, from the AI Professor to a class reunion for both his courses: “So Prof, you spent your time in one course persuading us that we were wrong to be so sure that a machine couldn’t have a mind, and in the other course that we were wrong to be so sure that it could. How can we know for sure?”

“We can’t. The only one that can ever know for sure is the machine itself.”

Fool’s Gold Open Access

[Appeared in University World News May 14 2027]

Scientists and scholars (henceforth “S&S”) are an odd lot. They’re clearly not savvy in business matters, otherwise they would go into business. (A few do â€” not necessarily the best of the lot.) S&S are also quite unlike those whose trade is writing (authors of fiction and nonfiction, journalists, writers of advertising copy, free-lance trolls) who sell their words for a fee. Trade literature is a product, written for revenue. S&S research journal articles are not written to be sold. They are written to be read, used, applied, and built upon, by other S&S. They are contributions knowledge.

Not that S&S are altruists or independently wealthy. The extent to which their work is read, used, applied, and built upon determines their careers. Their work is funded by their employers, usually universities and research institutions, and by research funding councils – the latter often governmental, with tax-payers’ money. So rather than selling their words, S&S have every interest in making their work accessible, free of cost, to all its would-be users.

In the old pre-digital days of S&S publishing, the true costs of providing print-on-paper to would-be users required the services of another profession for the production and delivery. But (let’s cut to the quick) those days are over, forever. Online publication is not altogether cost-free, but the costs are so ridiculously low that all an S&S author need pay for is a blog service-provider, rather like a phone or email service-provider. In this world, the idea of paying a ÂŁ2,700-per article fee to publish an article is as grotesque as it is gratuitous.

Not quite, but almost. There is one factor I have left out. S&S research is peer-to-peer research, from trained specialists to trained specialists. So, there is a “quality control” phase, called “peer review,” in which experts “referee” the work of their peers to evaluate it for “publishability.” Publishability where? In a peer-reviewed journal whose imprimatur and track-record certifies its quality-level. Why? Because reading, using, and applying research to produce further research takes time and effort. There is no time to just dip into whatever might appear in any unrefereed S&S blog posting and risk trying to use and build on it.

There is debate about this. We are living in an uncharted era of uncontrolled digital disinformation of all kinds. Some feel that relying on unrefereed, uncertified S&S information in the chatGPT era is not only risky for S&S research creators, users, and S&S research (meaning human knowledge) itself:  Unrefereed S&S may be risky for the ordinary citizen too, whose health and safety depends on it. Others, however, feel that S&S’s traditional quality-controller, which was not just peer review and certification before publication but the cumulative, self-corrective nature of S&S research itself, across time, is enough. In the long run, you can’t build research on a shaky foundation. The truth will out. This may be truer of S&S’s first “S,” (Science, including Technology), which is regarded as more objective than the second “S” (Scholarship). But anyone who looks more closely at the actual cognitive, social and political dynamics of scientific research may not be so confident. With the lightning-fast, global reach of digital disinformation, it is easy to imagine the instant catastrophe that could be triggered by an unrefereed, pseudo-scientific, S&S blog-post convincingly authored by a chatGPT in mercenary hands (whether for combat or commerce) reporting a bogus, fatal, but widely disseminated “cure” during a Covid pandemic.

This may all be too shrill, or at least premature. Besides, it is irrelevant to the issue of S&S publishing costs because peer-reviewing is not, never was, and never could be provided by publishers. It is, and always was, provided by the peers, the S&S community itself. And provided for free

So, you should ask, with online publishing costs near zero, and quality-control provided gratis by peer-reviewers, what could possibly explain, let alone justify, levying a ÂŁ2,700 fee per article on S&S authors trying to publish their give-away articles to report their give-away findings?

The answer is not as complicated as you may be imagining, but the answer is shocking: The culprits are not the publishers but the S&S authors, their institutions and their funders! The publishers are just businessmen trying to make a buck. In fact, ÂŁ2,700 is the same amount they were making per article before the online-access era, back in the Gutenberg era of print-on-paper. Under mounting “open access” pressure from S&S authors, institutional libraries, research-funders and activists, the publishers made the obvious business decision: “You want open access for all users? Let the authors, their institutions or their research funders pay us for publication in advance, and you’ve got it!” How much did the publishers ask for? The same amount per article that they had been making from subscriptions. No allowance for the sea-change in the true costs for providing online access, nor for the fact that the only remaining essential service they were providing – peer-review – was not being provided by them at all, but by the peer community, for free. The same peers who would now be paying the “publication fees.”

So the publishers are not to be blamed for trying to hold onto their golden goose: gratis articles from authors; gratis refereeing services from peers; no need for a print edition; so nothing left for the publisher to do but collect the rent. All other goods and services are now obsolete – or almost obsolete: scaled down to only the tiny, trivial cost per article of managing the peer review, by paying secretaries to run the software for soliciting and monitoring it. And doing it shabbily, virtually the way commercial products are advertised by spamming potential consumers. Except the consumers here are the peers, who volunteer their services gratis (and increasingly shoddily). Why do they do it at all? The answer will shock you again: superstition;the same superstition that keeps the authors (and their institutions, and their funders) complying with their publishers’ outrageous copyright conditions: “If I don’t referee, the journal won’t publish my own papers anymore! If I post my own refereed, revised paper online, free for all, I will be pursued by my publisher for copyright violation, and the journal won’t publish my own papers anymore.” Publish or perish. 

And authors’ institutions, too, were thinking and acting along the same superstitious (and lazy and poltroonish) lines. The university libraries, which had been lobbying for lower subscription prices, immediately lost interest once their subscription burden was lifted. The publishers’ golden goose had been successfully converted to “Fool’s-Gold OA” (Open Access), meaning continuing to pay the obsolete costs at the same price, but as author-end fees for publication instead of user-end subscription fees for access. (“Fair-Gold” OA would have been to charge only the tiny fee for managing the peer review.) The publishers are to be congratulated for successfully pulling off this scam, with the obsolete 40% mark-up of ÂŁ2,700 per article in exchange for next-to-nothing suspended above by a skyhook, gloating, like the Cheshire Cat’s smile. 

It is not as if the S&S community had no other choice. â€œGreen OA” self-archiving had been offered to them as an alternative, with the University of Southampton providing the free software for creating Green OA institutional repositories as well as the model for institutional and funder self-archiving mandates that would require all university researchers and all recipients of research funding to self-archive their refereed research therein, immediately upon acceptance for publication (“or perish”). That policy would have forced the publishers to downsize to just the minimal remaining costs of managing peer review.

But supersition (and habit, and digital laziness – of the fingers) prevailed, and the publishers are still laughing all the way to the bank.

Stevan Harnad, Emeritus Professor of cognitive science, University of Southampton, Professor of Psychology at UniversitĂ© du QuĂ©bec Ă  MontrĂ©al, and editor of Animal Sentience (a Fair-Gold OA online-only journal, published and subsidized by WellBeingInternational: no publishing fee, no access fee) is an erstwhile OA archivangelist who has wearied of trying to cajole scientists and scholars to overcome their superstitions and just move their fingers. 

Semantic Ghosts in Syntax

Microsoft Says New A.I. Shows Signs of Human Reasoning

GPT definitely does not understand. It’s just a computer program plus an unimaginably huge database of the words that countless real thinking people have written, in books, articles, and online media. The software does increasingly sophisticated “filling in the blanks” when it answers questions, using that enormous data-base of things that other people have written (and spoken) about all kinds of other things. It is amazing how much can be gotten out of such an enormous database by software (which is what GPT is).

GPT neither understands nor means anything with what it says. It is (as Emily Bender has aptly dubbed it) a “statistical parrot.” It doesn’t parrot verbatim, echolalicly, what words are said to it, but it draws on the combinations of the words said to it, along with all those other words it has ingested, recombining them to fill in the blanks. “What is the opposite of down?” No one is surprised that GPT says “up.” Well, all those other verbal interactions are the same, only you need very powerful software and an enormous database to fill in the blanks there too.

One other thing (much more important than fussing about whether GPT thinks, understands or means anything — it doesn’t): The success of GPT teaches us something about the nature of language itself: Language can encode and transmit the thoughts of real, thinking people. The words (in all languages) are arbitrary in shape. (“Round” does not look round.) Single words could all just as well have been strings of 0’s and 1’s in Morse code (of English or any other language; “round” = .-. — ..- -. -..). The strings of words, though (rather than just sounds or letters), are not quite that arbitrary. Not just because the patterns follow grammatical rules (those rules are arbitrary too, though systematic), but, once people agree on a vocabulary and grammar, the things they say to one another, preserve (in linguistic code) some of the structure of what speakers and hearers  are thinking, and transmitting, in words. Their form (partly) resembles their meaning. They are the syntactic shadows of semantics.

Language is not video (where it’s form that preserves form). And video is not language. If I want you to know that the cat is on the mat (and the cat really is on the mat, and we are zooming), I can aim the camera at the cat and if you have functioning eyes, and have seen cats and mats before, you will see that the cat is on the mat. And if you can talk English, you can caption what you are seeing with the sentence “The cat is on the mat.” Now that sentence does not look anything like a cat on a mat. But, if you speak English, it preserves some of its structure, which is not the same structure as “the mat is on the cat” or “the chicken crossed the road.”

But if there were nothing in the world but mats and cats, with one on the other or vice versa, and chickens and roads, with the chicken crossing or not crossing the road, then any trivial scene-describing software could answer questions like: “Is there anything on the mat?” “What?” “Is anything crossing the road?” Trivial. Could also easily be done with just words, describing what’s where, in a toy conversational program.

Now scale that up with what GPT can do with words, if it has an enormous sample of whatever people have said about whatever there is to say, as GPT has. “Who killed Julius Caesar?” “What is the proof of Fermat’s last theorem?” That’s all in GPT’s database, in words. And if you talk about more nonsensical things (“Is there a Supreme Being?” “Is there life after death?” “Will Trump stage a second coming?”) it’ll parrot back a gentrified synthesis of the same kinds of nonsense we parrot to one another about that stuff — and that’s all in the database too. 

Every input to GPT is saying or asking something, in the same way. It just sometimes takes more work to fill in the blanks from the database in a way that makes sense. And of course GPT errs or invents where it cannot (yet) fill the blanks. But one of the reasons OpenAI is allowing people to use (a less advanced version of) GPT for free, is that another thing GPT can do is learn. Your input to GPT can become part of its database. And that’s helpful to OpenAI.

It’s nevertheless remarkable that GPT can do what it does, guided by the formal structure inherent in the words in its database and exchanges. The statistical parrot is using this property of spoken words to say things with them that make sense to us. A lot of that, GPT can do without help, just filling in the blanks with the ghosts of meaning haunting the structure of the words. And what’s missing, we supply to GPT with our own words as feedback. But it’s doing it all with parrotry, computations,  a huge database, and some deep learning. 

So what do we have in our heads that GPT doesn’t have? Both GPT and we have words. But our words are connected, by our eyes, hands, doings, learnings and brains, to the things those words stand for in the world, their referents. “Cat” is connected to cats. We have learned to recognize a cat when we see one. And we can tell it apart from a chicken. And we know what to do with cats (hug them) and chickens (also hug them, but more gently). How we can do that is what I’ve dubbed the “symbol grounding problem.” And GPT can’t do it (because it doesn’t have a body, nor sense and movement organs, nor a brain, with which it can learn what to do with what – including what to call it).

And we know where to go to get the cat, if someone tells us it’s on the mat. But the sentence “the cat is on the mat” is connected to the cat’s being on the mat in a way that is different from the way the word “cat” is connected to cats. It’s based on the structure of language, which includes Subject/Predicate propositions, with truth values (T & F), and negation.

GPT has only the words, and can only fill in the blanks. But the words have more than we give them credit for. They are the pale Platonic contours of their meanings: semantic epiphenomena in syntax.

And now, thanks to GPT, I can’t give any more take-home essay exams.

Here are some of my chats with GPT.

P.S. If anyone is worrying that I have fallen into an early-Wittgensteinian trap of thinking only about a toy world of referents of concrete, palpable objects and actions like cats, mats, roads, and crossing, not abstractions like “justice,” “truth” and “beauty,” relax. There’s already more than that in sentences expressing propositions (the predicative IS-ness of “the cat is on the mat” already spawns a more abstract category — “being-on-mat”-ness — whose name we could, if we cared to, add to our lexicon, as yet another referent). But there’s also the “peekaboo unicorn” — a horse with a single horn that vanishes without a trace the instant anyone trains eyes or measuring instrument on it. And the property of being such a horse is also a referent, eligible for filling in the blank for a predicate in a subject-predicate proposition, is now available. (Exercise of scaling this to justice, truth and beauty is left to the reader. There’s a lot of structure left in them there formal shadows.)

Platonic Shades Meaning — by Dall-e