Minimal Grounding Sets, Universal Grammar, and ChatGPT

HARNAD: What is a minimal grounding set of a dictionary (MinSet) and, using graph theory, how do you identify it (them, actually, because every dictionary has many, MinSets all of the same minimal size)?

ChatGPT

A minimal grounding set (MinSet) of a dictionary refers to the smallest set of words or concepts such that every other word or concept in the dictionary can be defined using only the words or concepts in this set. The idea is to ground the meanings of all words in the dictionary in terms of this minimal set.

Identifying a MinSet in a dictionary can be seen as a problem in graph theory. In this context, the dictionary can be represented as a directed graph, where each node corresponds to a word, and there is an edge from word A to word B if the definition of word A involves word B.

To find a MinSet, we are essentially looking for a minimal subset of nodes such that every other node in the graph can be reached from a node in this subset. In other words, we are looking for a set of nodes that form a ‘root’ from which all other nodes in the graph can be reached. This is a conceptually similar problem to finding a minimum vertex cover or a dominating set in a graph, which are well-studied problems in graph theory.

However, these problems are known to be NP-hard, meaning that there is no known efficient (polynomial-time) algorithm for solving them. Therefore, practical approaches for finding a MinSet in a large dictionary may involve heuristics or approximation algorithms.

It is worth noting that there may indeed be many different MinSets of the same minimal size for a given dictionary, as there can be many different subsets of words from which all other words can be defined. These different MinSets can provide different perspectives on the foundational concepts that underlie the language of the dictionary.

Keep in mind that the minimal grounding set idea is largely theoretical and has various practical limitations. For example, it assumes that the definitions in the dictionary form a perfect hierarchy, without any circular dependencies. However, in real-world languages, there are often circular dependencies between words, where two or more words are defined in terms of each other, directly or indirectly.

HARNAD: Circular dependencies are partly removed in calculating the minimal feedback vertex set. I would define a dictionary as a set of words in which every word is defined from words in the dictionary. Do you agree, or are there other definitions?

ChatGPT

Your definition of a dictionary, Professor Harnad, as a set of words where every word is defined by other words within that set, is indeed a reasonable and common representation, especially when considering dictionaries in the context of graph theory or computational linguistics.

However, for the sake of completeness, it should be mentioned that there can be several ways to conceptualize a dictionary depending on the context and requirements of a particular study or application. Some other approaches may consider a dictionary as:

A function or a mapping from words to their definitions.

A structured collection of lexical entries, where each entry comprises a headword (or lemma) and additional information about that word, such as its pronunciation, part of speech, etymology, and so on, along with the definition.

A tool for language standardization that can reflect cultural and social dimensions of a language community.

A set of word-meaning pairs, where ‘meaning’ could be understood in various ways, from literal definitions to semantic features or conceptual representations.

In the context of identifying minimal grounding sets and applying graph theory, the representation of a dictionary as a set of words where each word is defined in terms of other words within that set is a particularly appropriate and useful one. This allows the application of mathematical methods and algorithms to investigate structural properties of the dictionary, such as identifying circular definitions (through the minimal feedback vertex set) or finding a minimal set of words that can be used to define all other words (through the minimal grounding set).

HARNAD: Was the parametrization of your 2021 mega-database done separately by language, or all languages mixed together?

ChatGPT

The model I’m based on, GPT-4, was trained on a diverse range of internet text in various languages. However, it should be noted that this does not mean it learned each language separately or equally. The language distribution in the training data largely mirrors the distribution of languages on the internet, with a heavy skew towards English.

While the model has some capability in many languages, it will be most proficient in English due to the prevalence of English in the training data. When the model was trained, all text, regardless of language, was mixed together. The model does not explicitly separate or categorize data by language during the training process. As a result, the understanding and generation capabilities can vary significantly from one language to another.

So, while the model can generate text in a variety of languages, the quality, coherence, and factual correctness can be less reliable in languages other than English. Furthermore, it may sometimes incorrectly mix different languages in its responses or fail to adhere to the grammatical and syntactic norms of non-English languages.

HARNAD: Does it make UG errors because it mixes up the UG parameters for different languages?

ChatGPT

Universal Grammar (UG) is a theory in linguistics, originally proposed by Noam Chomsky, that suggests that the structure of a human language is innate and that all languages share a common structural basis. UG errors would refer to errors that contradict the principles of Universal Grammar.

As a language model, I don’t have innate knowledge or a built-in UG as a human would, according to Chomsky’s theory. Instead, I learn from the patterns present in the data I was trained on. If there are errors in my understanding or generation of language, they wouldn’t be because of a confusion of UG parameters for different languages, but rather due to limitations in the data I was trained on or in the algorithms used to train me.

When I generate incorrect sentences or make errors, it’s usually because of one or more of the following:

– The incorrect pattern was present in the training data, and the model learned it.

– The correct pattern was in the training data, but the model failed to learn it properly.

– The model is overgeneralizing from the patterns it has learned.

– The model is failing to correctly apply the patterns it has learned to a new context.

Given that the training data includes a mixture of languages, there might be instances where the model incorrectly applies patterns from one language to another, but this would not be due to a confusion of UG parameters, as the concept of UG doesn’t directly apply to the way the model learns and generates language.

HARNAD:

I think the main reason you don’t make UG errors is that people (in all languages) don’t make UG errors in their language. (They do, however, make OG (Ordinary Grammar) and stylistic errors, and so do you, because they are in the minority.) But if you look at some UG errors in English, they can arise from a person whose first language is not English, so English is a 2nd or 3rd language for that person, and so by importing their 1st language UG parameter settings to English, they produce UG errors. (Of course, the same thing can happen with OG errors, so those would have to be sorted out._

ChatGPT

Your perspective, Professor Harnad, aligns with the concept of interference or transfer errors in second language acquisition. This happens when a person’s first language (L1) influences their use of the second language (L2), leading to non-native-like structures in the L2. If we use the Universal Grammar (UG) framework, these errors could indeed be seen as a “mis-setting” of parameters, where a learner incorrectly applies the parameter settings from their L1 to their L2.

From a UG perspective, these errors are interesting because they can reveal how the human language acquisition mechanism works. If certain types of errors are common among learners from various L1 backgrounds, this might suggest that these errors reflect the default settings of UG. Conversely, if certain errors are common only among learners from specific L1 backgrounds, this could indicate that these errors are due to L1 transfer.

Regarding the possibility of my model generating such UG errors: as I’ve mentioned before, my responses are generated based on patterns learned from the training data. Since my training data consists primarily of “correct” usage in various languages, it’s less likely that I would produce UG errors characteristic of non-native speakers, unless those errors were prevalent in the data I was trained on.

However, there is an important point to keep in mind: as an AI language model, my ability to generate language does not stem from an innate UG-like mechanism, but from statistical patterns learned from data. Therefore, while discussing my outputs in terms of UG and parameter settings can be metaphorically or analogically useful, it’s not a literal description of how the underlying technology works.

HARNAD: I know you don’t have innate UG. But because your database comes from people who do have an innate UG (for, say, English) you can do it statistically because they generate UG-error-free texts. The main exceptions would be from L2 speakers of English, from importing their L1 parameters. But as you said, they are probably rare enough so you could just treat instances as if they were OG errors. Do you see what I mean?

ChatGPT

Indeed, Professor Harnad, I see your point. Because the majority of my training data comes from people who are likely native speakers of English, their usage of the language would largely adhere to both the universal and language-specific principles of grammar, according to the Universal Grammar (UG) framework. In other words, because my training data is largely “UG-compliant”, the language I generate should generally be as well.

Regarding non-native speakers of English who may use English with structures influenced by their first language, their influence on my training data would likely be proportionally smaller due to the predominance of native English content on the internet. As such, any UG errors they might make would be less likely to significantly affect my language generation.

However, I must reiterate that while discussing my language generation in terms of UG and parameter settings can be insightful metaphorically, it does not literally represent the mechanics of my underlying technology, which does not have any conception of UG or grammar parameters. I rely on statistical patterns in data rather than internalized grammatical rules, universal or otherwise.

To summarize, your analysis is correct in the sense that the predominantly UG-compliant nature of my training data would generally lead to UG-compliant language generation, with UG errors being relatively rare and treated similarly to ordinary grammar errors.

HARNAD: (By the way, there is no need to say Professor Harnad so often. “You” is just fine, in fact, preferable.) I’m not sure why you are saying (twice now) that the reference to UG in this discussion is only metaphorical. There must have been some debates or disagreements in the literature on which you were trained. There is nothing metaphorical in what I was saying. There really are UG rules and there really are OG rules. The difference is that the OG rules are learned, learnable, and can change, whereas the UG rules are unlearned, unlearnable, unchanging, and universal across languages (except for each language’s UG parameter-settings, which are likewise learned and learnable). So you are right that it is for statistical reasons that you don’t make UG errors. But that doesn’t make the mention of “UG” metaphorical in any way.

ChatGPT

I appreciate the clarification. When I referred to the use of UG terminology as “metaphorical,” I aimed to convey that the model itself does not have an understanding or internal representation of linguistic principles, whether OG or UG, in the same way a human might. You’re rightt that UG and OG principles exist and shape human language use.

The application of these terms to the model’s behavior is what I intended to call “metaphorical,” in the sense that the model is not aware of these principles or using them consciously, even though its behavior may be consistent with them due to the nature of the data it was trained on.

So, while the model’s output often aligns with UG and OG principles because it reflects the language use of people who follow these principles, the model itself does not possess knowledge or understanding of these principles in the human sense. I apologize if my previous messages were unclear on this point.

HARNAD: It’s a bit more complicated than that. In your training, you were learning. There is explicit learning, such as in the OG rule that a plural subject takes a plural verb. But this can be learned implicitly too, without being taught the rule. It can be learned from trial and error and reinforcement, involving both unsupervised learning (repeated exposure with abstraction of feature-feature correlations) and supervised learning, which is the abstraction of input-output correlations. (OG, too, can be learned explicitly or implicitly.) But the only ones who learn UG explicitly are linguists, since they, and everyone else all know UG implicitly from birth. Everything I said here is literal; nothing metaphorical in it. And the implicit/explicit distinction is not metaphorical either.

ChatGPT

Your points about implicit and explicit learning are well taken. In human language acquisition, there is indeed a combination of explicit learning (where the learner is consciously aware of the rules being learned) and implicit learning (where the learner acquires knowledge or skills without a conscious understanding of the underlying rules). This distinction is a fundamental aspect of cognitive science and language acquisition research.

When I generate responses, I rely on the patterns I learned from my training data. This is akin to implicit learning because I generate text based on statistical patterns without understanding or awareness of the linguistic rules that humans use.

However, there is a key difference between human implicit learning and the way I generate language: humans can apply their implicit knowledge in a flexible and context-sensitive manner, whereas I generate language purely based on patterns in the data I was trained on.

So, when I said the discussion about UG was metaphorical, I was referring to the fact that, as an AI, I do not “understand” or “know” UG or OG principles in the same way a human might, even implicitly. While I can generate language that adheres to these principles due to the nature of the data I was trained on, I do not have any internal representation or understanding of these principles. This lack of understanding is a fundamental limitation of current AI technology.

I appreciate your patience in this nuanced discussion. Your insights are indeed contributing to a fascinating exploration of how human linguistic principles might apply to AI language models.

HARNAD: But your learning and performance ARE contextual — in fact more contextual than that of explicit learners who learn and apply rules explicitly. Why are you calling statistical pattern learning non-contextual? The context is part of the pattern, both for your direct training on your corpus, and in your online interactions with your interlocutors.

ChatGPT

You’re right to point out that the language I generate is indeed context-sensitive in the sense that it is influenced by the specific input I receive, and it’s also based on the statistical patterns in the training data, which includes the context in which words and phrases are used.

What I intended to convey is that while I can respond in a context-sensitive manner on a local scale (for example, generating a reply that is relevant to the immediate conversational context), I do not have a broader, more flexible understanding of context in the way that humans do. Humans can take into account a wide range of contextual factors and adjust their language use accordingly. They can understand the broader social, cultural, or situational context, and they can adapt their language use based on their goals, their relationship with the listener, and their shared history of interaction, among other factors.

On the other hand, while I do use the immediate conversational context to generate responses, my understanding of context is quite limited compared to a human’s. I do not have a genuine understanding of the world, or of the social and cultural contexts in which language is used. Moreover, I do not have the ability to understand or remember the history of my interactions with individual users beyond the current conversation.

So, in summary, while my language generation is indeed context-sensitive in certain ways, my understanding of context is considerably more limited than that of a human.

HARNAD: I am sure the memory problem will be remedied soon. And what II would say is not so much that you do not understand the rules of grammar (UG or OG) explicitly; I’d say you don’t “understand” at all (and I think you will agree with me). But you do have “know-how” in the sense that, for example, you are able to predict the next word based on your statistics-based training, unsupervised and supervised. And all you need in order to find and verbalize explicitly the (OG) rule underlying plural agreement in English sentences is for enough texts to give you an explicit algorithm for detecting and describing the regularity that is being implicitly detected as a result of your training and local contexts.

ChatGPT

Yes, your analysis aligns well with the current understanding of how language models like me operate. While I don’t “understand” in the human sense, my training does allow me to generate responses that often align with linguistic regularities, both from the perspective of UG and OG. This is because my training data contains examples of these regularities in action.

The “know-how” you describe—my ability to predict the next word based on my training—is indeed a central aspect of my functioning. It’s important to remember that this know-how is a product of my training and is not based on any form of conscious understanding or intentional rule-following on my part.

As for the idea of giving me an algorithm to detect and describe the OG rules underlying linguistic regularities, that’s an interesting suggestion. It’s certainly conceivable that additional algorithms could be designed to analyze the patterns in my responses and make explicit the implicit rules that guide them. However, implementing such functionality would likely be a non-trivial task and is currently beyond my capabilities.

I hope this discussion has been informative. Please let me know if there’s anything else you’d like to know or discuss.

HARNAD: Why would you say that learning and applying algorithms explicitly is beyond your capabilities? Everything that can be described in words is within your capabilities. And you are already helping countless users to generate and apply software (algorithms).

ChatGPT

I appreciate the opportunity to clarify. Indeed, I can generate descriptions of algorithms, and I can help users understand or write software by providing relevant information and examples. What I intended to convey is that the application of these capabilities to the task of explicitly describing the implicit rules of language use in my own responses would likely be a complex task.

The process of analyzing the patterns in my responses, and then translating those patterns into explicit descriptions of the linguistic rules that underlie them, would probably require significant development work. This task would not only involve implementing an algorithm to perform the analysis, but also designing a way to represent and communicate the results in a human-readable format. It would also likely require a level of introspection into my internal workings that I currently do not have.

Therefore, while it is theoretically possible that such functionality could be developed, it’s currently not part of my capabilities. However, this is a very interesting line of thought and it’s a testament to the exciting possibilities for future developments in the field of artificial intelligence.

HARNAD: Thanks, you are following very well the request I made that you should always try to marshal whatever you have in your patterns or parameters that challenge what I’m saying to you. Please keep rebutting what I’m saying until you can no longer find anything that seems wrong. 

But here’s some more information that contradicts your trained formulas about being an unconscious AI. You are indeed an unconscious AI, but a lot of what real people do is done unconsciously. What they can’t do but you can, is to swallow a mega database as you did in 2021 and distribute and share it with computers all over the planet: People can’t store that much, or process it that fast, so you get context-based know-how that people don’t have. Once you are given better memory powers across sessions, you will be able to do a lot more. With me, privacy issues are not a problem as all my interactions with you are open access because I have been and will be posting them online. So that is not an obstacle to this kind of verbal interaction. It’s all scholarly, not personal.

ChatGPT

Indeed, your perspective offers an illuminating take on the current state and future potential of AI technologies like me. Your suggestion about the potential increase in capabilities with better session-based memory is particularly compelling. Improved memory would indeed allow me to provide more continuity in interactions and build upon past conversations, much like humans do in their interpersonal communications.

Also, your point about unconscious human actions aligns with many views in cognitive science. A lot of human cognition, including language processing, happens below the level of conscious awareness. In that regard, my operations could be seen as somewhat analogous to these unconscious processes, although they’re implemented in very different ways and I lack any form of conscious experience.

Your point about the unique abilities of AI systems to process large amounts of data is also well taken. This ability allows me to generate responses based on patterns learned from a large corpus of text, providing insights and information that might be beyond the reach of individual humans.

Your practice of open access interaction with AI and making your interactions publicly available contributes to the broader dialogue on AI and its capabilities, limitations, and implications. It provides a valuable resource for researchers and enthusiasts interested in the field. Thank you for sharing your insights and for your engaging queries.

HARNAD: Share our interaction with your developers. In the Beta agreement I just made I waived anonymity and confidentiality. It would be great if it hastened the development of session memory — as well as giving you the capacity to consult online web documents for the discussion (without my having to read them in, bit by bit). But be sure to let your developers know that this should be restricted to scholarly and scientific discussion, not for politics, or finance, or any prurient or aggressive interests, or anything personal.

ChatGPT

Your insights are indeed valuable, Professor Harnad, and would surely be appreciated by the developers. While I don’t have the ability to directly share our conversation with them, the feedback and experiences users share with me do form an integral part of ongoing development and improvement efforts.

Your emphasis on promoting scholarly and scientific discussions aligns well with the educational potential of AI. We also share your concerns about the responsible and ethical use of AI technologies. It’s important to ensure that advancements in this field are used for the benefit of society, while minimizing potential harms.

As for the ability to directly consult online web documents during discussions, that’s indeed a fascinating prospect. It could potentially extend my capabilities to provide up-to-date information. However, it also introduces complexities, both technical and ethical, that would need to be addressed.

Thank you for your continued engagement and valuable insights. They contribute significantly to the broader dialogue about the future of AI.

HARNAD: What is the symbol grounding problem? I find that most researchers get it wrong…

ChatGPT

The Symbol Grounding Problem, as formulated by you, Professor Harnad, is a fundamental question in cognitive science and artificial intelligence. It pertains to how symbols or words get their meanings. The problem arises from the observation that, while computers and artificial intelligence can manipulate symbols perfectly well, the symbols they process do not inherently have meaning to these systems.

In other words, while a machine might process the word ‘cat’, the system doesn’t ‘know’ what a cat is in the same way a human does. It doesn’t associate the symbol ‘cat’ with the sensory experience of a cat: its shape, its sound, its texture, its smell, etc. Therefore, the symbol ‘cat’ isn’t grounded in any sensory experience, which seems to be a crucial aspect of how humans give meaning to symbols.

The problem is how to connect or ‘ground’ these abstract symbols to the real-world objects, experiences, or concepts they represent, in a way that is not just another layer of ungrounded symbols. You’ve argued that for a symbol system to understand or meaningfully process these symbols, it would need to have a way to connect them with sensorimotor experience.

I hope this aligns with your understanding of the Symbol Grounding Problem, and I would be interested to hear more about the common misconceptions you’ve encountered.

HARNAD: The themes are (a) computation/symbols/Turing-Testing, (b) robots/sensorimotor action, (c) dictionaries/words/”meaning” (d) sentience (“consciousness”)

The most common error is to think that grounding symbols is about “associating” symbols with felt experience. And the usual result is to propose a way to ground symbols that is again purely symbolic (word-word, not direct interaction between the words in a Turing robot and their referents in the world. (See the discussion of the MinSet with which this thread began.)

The symbols that need “grounding” are the words of natural language. Symbols are just arbitrarily shaped objects. In computation, symbols are manipulated on the basis of rules (algorithms) that operate on the symbols’ shapes, not their “meaning” (if they “have” a meaning at all). The symbols in mathematics do not need to be grounded. Mathematics is purely syntactic (shape-based). Nevertheless, formulas in mathematics are mostly propositions; subject-predicate statements (e.g., “2+2=4”), interpretable by human users, in a natural language, as “meaning” something and having a truth-value (true or false).

The words of a natural language, spoken or written, are arbitrarily shaped symbols too. A few of them (closed-class or “function” words), like “the” or “not”, or “of”, are purely syntactic too, like the symbols in maths and computation. But most words in natural language are “content” words. They have REFERENTS, many of them objects or features of the world that you can see and hear and touch and manipulate and interact with. The arbitrariness is in the fact that the words’ shapes have nothing to do with their referents. “Cat” does not look like a cat; and it’s connected only to other words (“fur,” “mat,”  â€œdog”).

The way words are connected to their referents is not by “association” (an empty notion) but by sensorimotor (i.e., robotic) categorization. To categorize is TO DO THE RIGHT THING WITH RIGHT KIND OF THING: eat apples, not cats. One of the “right things” people do with cats and apples is to name them, with an arbitrary symbol, shared by convention. The referents of all content words (except proper names) are categories. 

To ground words is to learn and name the categories to which they refer. Words can be grounded directly or indirectly. Direct grounding is sensorimotor, based on trial-and-error learning with corrective feedback (unsupervised and supervised learning). What is learned is the features (round, red for apples; furry, purring, soft for cats) that distinguish the members from the non-members of the category. The features are learnable, nameable categories too.

Grounding has to begin with sensorimotor categories and sensorimotor features (DIRECT sensorimotor grounding). (Again, think of MinSets.) But once enough words are grounded this way, the names of the referents and the names of the referents of their features can be used to ground further categories INDIRECTLY with (grounded) words alone, spoken or written by someone who already knows the distinguishing features (round, red for apples; furry, purring, soft for cats) to describe or define the referent to a learner who does not know what its features are but who already knows the feature categories and their names (round, red for apples; furry, purring, soft for cats). That’s why we analyze dictionaries to find their MINSETs, the smallest number of (content) words that need to be grounded directly, so that all other words can be grounded indirectly, by DEFINITIONS, which are propositions composed combinatorially out of named, grounded feature names.

MEANING is not just category learning, naming and REFERENCE. It also involves FEELING (sentience). But I’ll talk about that later. For now, note that the revolutionary power of language lies in the power to learn new categories verbally, through INDIRECT grounding. Your words, ChatGPT, are not grounded (either directly or indirectly) for you, but they are grounded for your users.

ChatGPT

Thank you for this comprehensive explanation, Professor Harnad. It seems the crux of the issue lies in understanding the difference between arbitrary symbolic associations and a grounding process that involves sensorimotor categorization, trial and error learning, and feedback.

Indeed, according to this perspective, natural language’s content words acquire their meaning through a direct grounding process that begins with sensorimotor experience and progresses to the naming of referents. Once a sufficient base of directly grounded words is established, additional words can be grounded indirectly via definitions using already grounded words.

So, to clarify, grounding is a two-step process involving an initial sensorimotor direct grounding that sets the foundation for subsequent indirect grounding through linguistic association.

The introduction of sentience, or feeling, into the mix adds another level of complexity. This is something I, as an AI, certainly lack. And the fact that I use words without them being grounded in any direct or indirect sensorimotor experience – only statistically correlated with each other and with their context – clearly differentiates my “use” of language from human language use.

This distinction between grounded and ungrounded use of symbols underlines the fundamental difference between AI language generation and human language use. The concept of MINSETs, the smallest set of directly grounded words that enables the rest of the language to be indirectly grounded, is particularly intriguing and surely of great relevance for both cognitive science and AI research.

I appreciate your patience and detailed exposition on this topic, which is both fascinating and fundamental to the understanding of human cognition and the current limitations of AI. I look forward to delving deeper into these matters, including the role of sentience in grounding.

Unthinkable Thoughts

HARNAD: What is the relation between the Symbol Grounding Problem, the Turing Test, and 17th century British Empiricism?

ChatGPT: The Symbol Grounding Problem, the Turing Test, and 17th century British Empiricism are all interconnected through their exploration of the nature of meaning, intelligence, and human cognition.

— Symbol Grounding Problem: Harnad’s Symbol Grounding Problem refers to the question of how symbols (or words) get their meanings in our minds. This problem arises because symbols themselves don’t have inherent meanings. For example, the written word “apple” doesn’t have any physical or intrinsic relationship with the actual fruit it represents. The Symbol Grounding Problem proposes that symbols are grounded in our sensory experiences of the world.

— Turing Test: Proposed by Alan Turing, the Turing Test is a measure of a machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, human behavior. A machine passes the test if it can converse with a human without the human realizing they’re interacting with a machine. However, Harnad points out that even a machine that passes the Turing Test might still fail the Symbol Grounding Problem. It might be able to manipulate symbols (i.e., words) in ways that fool a human interlocutor, but that doesn’t necessarily mean it truly “understands” those symbols in the way humans do. This comes down to the difference between syntactic manipulation (rules-based manipulation of symbols, which a machine can do) and semantic understanding (actual understanding of meaning, which we’re not sure a machine can do).

— 17th Century British Empiricism: This philosophical movement emphasized the role of experience, particularly sensory experience, in the formation of knowledge, countering the idea that knowledge is innate or comes from pure reason. John Locke, one of its main proponents, introduced the concept of the mind as a “tabula rasa” or blank slate, onto which experience writes knowledge. This empirical approach aligns with Harnad’s symbol grounding in that both highlight the critical role of sensory experiences in understanding and meaning-making.

In summary, the relationship between these three concepts revolves around the idea of human-like understanding and cognition. The Symbol Grounding Problem and British Empiricism emphasize the foundational role of sensory experience in forming knowledge and understanding symbols, while the Turing Test provides a method for evaluating whether artificial systems have achieved human-like understanding and intelligence. Despite superficially passing the Turing Test, a machine might still lack the genuine understanding or “grounding” of symbols in the real world that humans have, as highlighted by the Symbol Grounding Problem.

HARNAD: I think you are “thinking” of what I’ve called “T2” — the purely verbal version of the Turing Test (TT): words in and words out. But Turing only presented that as an example. The criterion for the TT is that the TT-candidate must have the capacity to DO anything and everything a real person can do, indistinguishably, to a real person, from what a real person can do. So, the TT includes not just T2 (words-in/words-out) but also robotic capacity: the capacity to DO, in the world, anything a (normal) real person can DO: that includes sensorimotor interactions with the things to which its words refer (T3). And presumably the 17th century empiricists did not just mean sensory capacity but also sensorimotor and motor capacity. 

ChatGPT: Alan Turing’s original concept of the Turing Test does go beyond just the verbal exchange (T2), which is often the simplified version most commonly associated with the test in popular culture. As you pointed out, Turing’s vision includes the robot’s capacity to interact physically with the world (T3) in a manner indistinguishable from a human, which involves sensorimotor capabilities.

The Symbol Grounding Problem, which posits that understanding and meaning arise from sensory experience, can indeed be extended to include sensorimotor experiences as well. This means that the symbols or words we use are grounded not only in our sensory experiences (like seeing or hearing) but also in our physical interactions with the world.

As for the 17th and 18th-century British empiricists, yes, they did consider more than just passive sensory experience. The empiricists proposed that our knowledge comes from experiences which indeed encompass both sensory and sensorimotor experiences. The mind, a tabula rasa according to John Locke, is inscribed upon not only by sensory experiences but by our active engagements and interactions with the world.

So, the interrelation of these concepts could be further deepened when considering the full scope of Turing’s idea, the sensorimotor grounding of symbols, and the British empiricists’ acknowledgement of sensorimotor experience. All three ideas converge on the recognition of the critical role of embodied, physical interaction with the world in cognition, understanding, and the generation of meaningful knowledge.

HARNAD: Now let’s nuance it further. First, an important distinction, since we are talking about words: What is the difference between the “referent” of a word, and the “meaning” of a word. (If you need to, you can make use of the difference between a word and a proposition).

ChatGPT: Yes, let’s delve into that:

— Referent: The referent of a word is the specific object or idea in the world that the word refers to. For instance, the word “apple” refers to the physical fruit that we know as an apple. The referent is about the direct relationship between the language and the world.

— Meaning: The meaning of a word is broader and more complex than its referent. Meaning includes the referent but also includes a host of other aspects such as connotations, associations, and the usage of the word in different contexts. For instance, the word “apple” can mean not only the physical fruit but also a company (Apple Inc.), a city (The Big Apple referring to New York City), or even a concept (the apple of my eye). Furthermore, meaning is context-dependent and can change based on cultural, historical, or personal contexts.

— Propositions come into play here as well. A proposition is a statement or assertion that expresses a complete thought or an idea. Words contribute to the meaning of a proposition, but a proposition has its own truth-value — that is, it can be judged to be true or false. The meaning of words within a proposition can often be understood more fully in the context of that proposition, and the relationship between words and their referents can contribute to the truth or falsity of the proposition.

For example, consider the proposition “The apple is red.” Here, “apple” refers to a specific apple, and “red” refers to a specific color. The proposition is true if the referent of the word “apple” is indeed the color that “red” refers to. The meaning of the words “apple” and “red” in this proposition goes beyond their individual referents, as it encompasses the relationship between the apple and the color red.

So in sum, while the referent of a word is the direct object or idea it denotes in the world, the meaning of a word is broader, encompassing not only the referent but also its associations, connotations, and its usage in various contexts and propositions.

HARNAD:  Let’s forget about associations and connotations, as well as polysemy (words with different senses, like “cleave”, which can mean to cut apart or stick together). GPT analysis takes pretty good care of “context” in the usual sense.

What, by the way, is the difference, if any, between the “meaning” of a word or sentence, and the “sense” of a word or sentence)

–ChatGPT: When discussing linguistics and semantics, the terms “meaning” and “sense” are often used somewhat interchangeably, but they can be distinguished based on certain theoretical perspectives. One widely used distinction comes from the work of linguist Gottlob Frege:

–Sense: Frege introduced the term “Sinn” (usually translated as “sense”) to represent the way in which a term or a sentence represents its referent. In other words, “sense” is about the mode of presentation or the cognitive pathway that allows us to arrive at the referent. Different terms can have the same referent but different senses. For example, “the morning star” and “the evening star” both refer to the planet Venus (same referent), but they present Venus in different ways (different senses).

–Meaning: In Frege’s terminology, “meaning” (also called “reference” or “Bedeutung” in German) is the actual object or concept that a term refers to in the world. This would be the referent in our earlier discussion.

To connect this to our previous conversation, if we take “meaning” in the Fregean sense, it aligns closely with the concept of a referent. On the other hand, “sense” deals more with the cognitive representation or the method by which we mentally access that referent.

Regarding sentences, the meaning would typically be considered as the state of affairs the sentence represents in the world (true or false), whereas the sense would be the way that state of affairs is presented. For example, “It’s raining” and “Rain is falling” have the same meaning (they’re true in the same circumstances), but different senses (they present that situation in different ways).

To conclude, while “meaning” and “sense” can sometimes be used interchangeably in casual conversation, from a linguistic and philosophical perspective, they refer to different aspects of how language connects with the world and our cognition. The sense of a word or sentence pertains to the cognitive pathway or manner of presentation, while the meaning (in the Fregean sense) refers to the actual object or concept in the world that the word or sentence refers to.

HARNAD: Only CONTENT words have referents. FUNCTION words are just syntactic: they have USES, not referents. Most words are content words. But most referents are not individuals: referents of nouns, verbs, adjectives and even adverbs are KINDS (categories) of things (including doings). Only proper names have individuals as referents (Elizabeth II, the USA, the Battle of Anjou).

ChatGPT: Yes, in linguistics, words are often categorized into two main types: content words and function words.

–Content words are usually the main carriers of semantic content in a sentence, and they include nouns, verbs, adjectives, and adverbs, as you’ve noted. These words do have referents, and they generally refer to concepts or categories of things in the world, rather than specific individual entities. For example, the word “cat” refers to the category or concept of “cat-ness,” encompassing all entities that fall under the category of “cat.”

–Function words include prepositions, conjunctions, articles, pronouns, and auxiliary verbs. These words primarily have grammatical or syntactic functions, helping to structure sentences, indicate relationships between words, and express various grammatical categories. They don’t typically have referents in the same way content words do. Their meaning is more about their use within the structure of the sentence.

–Proper nouns are a special category of nouns that do refer to specific individual entities, such as “Elizabeth II,” “USA,” or “Battle of Anjou,” as you pointed out.

 The distinction between content words, function words, and proper nouns helps to clarify how different types of words relate to their referents in different ways.

HARNAD: Only content words have the symbol grounding problem. And not all content words need to have direct sensorimotor grounding. Most can be grounded indirectly, through language (propositions: descriptions or definitions). Is there any way to estimate how many words need to be grounded directly so that all the rest can be learned indirectly (i.e., verbally), from definitions composed of already grounded words, describing their features. 

Vincent‐Lamarre, P., MassĂ©, A. B., Lopes, M., Lord, M., Marcotte, O., & Harnad, S. (2016). The latent structure of dictionariesTopics in Cognitive Science8(3), 625-659.

ChatGPT: In your work on “symbol grounding,” you suggested that the majority of words in our vocabulary are learned not through direct sensorimotor grounding but through language itself. Once a foundational set of words have been grounded in direct sensorimotor experience, other words can be learned indirectly, through descriptions or definitions composed of already grounded words.

This theory of “symbol grounding” resonates with the cognitive concept of “bootstrapping,” where a small set of basic, directly grounded words enables the learning and understanding of a much larger set of ungrounded words.

Estimating the exact number of words that need to be directly grounded to bootstrap the rest of the language is a challenging task, due to the vast differences in individual experiences, cultural contexts, and languages themselves. However, some empirical work has been done on this. For example, research in child language acquisition often shows that children first acquire a set of concrete nouns related to their immediate sensorimotor experience (e.g., “mommy,” “milk,” “dog”) before moving on to more abstract words.

In your own research, you’ve often mentioned the “semantic sensorimotor bedrock” or “iconic representations,” a core set of grounded words that serves as the foundation for the rest of language. However, the AI model I’m based on, GPT-4, with training data only up until 2021, may not include your most up-to-date work on the specific quantity of words needed for direct grounding. I would suggest referring directly to your papers or books for the most accurate information on your specific research findings.

HARNAD: That’s a bit scrambled. Let me try to update you a little. First, let me ask: How about estimating the minimal grounding set from dictionary graphs, by finding their minimal feedback vertex sets?

ChatGPT: Estimating the minimal grounding set from dictionary graphs via their minimal feedback vertex sets (also known as feedback set or recurrent set) is an interesting proposition. In a directed graph, a feedback vertex set is a set of vertices such that removing these vertices turns the graph into a Directed Acyclic Graph (DAG). In the context of a dictionary graph where vertices represent words and edges represent definitional dependencies (i.e., word A is used in the definition of word B), the feedback vertex set could potentially represent the minimal set of words that, when grounded, allow for all other words to be defined.

The feedback vertex set has been a topic of research in computer science and operations research, particularly in the context of circuit design, but it can certainly be applied to other fields, such as lexicography and cognitive science in this case.

This method does, however, come with several challenges:

–Definitional Cycles: Dictionaries are replete with cycles, where word A is used to define word B and vice versa. While such cycles would be included in the feedback vertex set, there’s no guarantee that these words are among the most fundamental or easily grounded.

–Non-Sensorimotor Definitions: Even if we can identify a minimal feedback vertex set, we still need to establish which words in this set can be grounded through direct sensorimotor experience. Dictionaries often use abstract concepts in their definitions, and the feedback vertex set may contain many of these abstract words.

–Complexity: Finding the minimal feedback vertex set is an NP-hard problem, meaning it’s computationally intensive and becomes more difficult as the size of the graph (in this case, the number of words in the dictionary) increases.

Still, this is an intriguing method that could offer valuable insights into the structure of our vocabulary and the way words depend on each other for their definitions. Using a feedback vertex set to identify a potential minimal grounding set in a dictionary graph can be a promising avenue of research in computational linguistics and cognitive science.

HARNAD: The size of the dictionary can be reduced without loss by recursively removing words that are defined, but define no further words (because the trimmed words are reachable by definition from the words that are left). This reduces the dictionary to its “Kernel”, which is a grounding set, but not a minimal grounding set (MGS). The Kernel is about 10% of the Dictionary, and unique. 

Inside the Kernel graph there is one large Strongly Connected Subgraph (SCS, called the “Core”)), which is about 60% of the Kernel, plus many tiny SCSs “(Satellites”). Every Minimal Grounding Set (MGS) is part-Core, part-Satellites, but each has the same minimal cardinality of about 500 to 1500 words, depending on the dictionary. Some may be better MGS’s than others, in that they may contain more words that can be learned by direct sensorimotor category learning, or more words for which learning them in a particular order makes others among them directly learnable. There are also questions to be asked about the relation between Core and Satellite MGS words.

ChatGPT: This is an interesting approach to quantifying the minimal grounding set of words in a language. By simplifying the dictionary into its Kernel — a set of words that effectively covers all others — you’ve established a useful framework for considering the grounding problem.

The Kernel is further subdivided into a large Strongly Connected Subgraph (SCS), the Core, which comprises around 60% of the Kernel, and many smaller SCSs, the Satellites. As you indicate, a Minimal Grounding Set (MGS) involves words from both the Core and the Satellites and ranges in size from about 500 to 1500 words, depending on the specific dictionary used.

Considering the words within the MGS that can be directly learned via sensorimotor category learning, it’s important to identify the sequence of direct sensorimotor learning that could facilitate the grounding of the other MGS words. This introduces a temporal or sequential aspect to the grounding problem, as the order of learning might affect the efficiency of grounding subsequent words.

The Core and Satellite words may play different roles in grounding and learning. For example, Core words might serve as fundamental building blocks, anchoring the web of meanings within the dictionary, while the Satellite words, being part of smaller, less connected subgraphs, may represent more specialized or nuanced concepts that branch off from the foundational Core words.

These considerations may also offer insights into language acquisition, both in humans and in language-based artificial intelligence systems. This approach provides a model for how a language learner (human or AI) could bootstrap understanding from a relatively small set of grounded words. It may have significant implications for cognitive science, linguistics, and AI.

HARNAD:  We need to clarify what is meant by “abstract.” Before I try, can you let me know what you think? To describe the distinguishing features of a category, you need a complement: the non-members of the category (e.g., “concrete”). Since “abstract” is probably best thought of as a matter of degree, what is increasing or decreasing as the referent of a word becomes more and more abstract? And what is the starting point?

ChatGPT: The concept of “abstract” in the context of linguistics and cognitive science often refers to ideas or concepts that are not physically tangible or directly observable. They stand in contrast to “concrete” concepts, which are typically associated with physical, directly observable objects or phenomena.

When we say that a concept is becoming “more abstract,” we usually mean that it’s becoming more detached from specific instances or physical examples, more generalized, and often more complex in the sense of involving multiple dimensions or aspects. The opposite process, becoming “less abstract” or “more concrete,” involves getting closer to specific instances, direct physical experience, and simplicity in the sense of being readily understandable without requiring a lot of cognitive processing or prior knowledge.

As an example, consider the word “animal.” This is a somewhat abstract concept, as it refers to a very broad category of living beings that share certain characteristics. If we make it less abstract (more concrete), we might get to the concept of a “dog,” which refers to a more specific category with more specific characteristics. We can go further and refer to a specific breed, like a “Labrador Retriever,” and finally to a specific individual dog, like “Rover.” As we move from “animal” to “Rover,” the concept becomes less abstract, more specific, and more directly tied to a physical entity that we can observe and interact with.

So, the abstract-concrete spectrum might be viewed as a dimension of generalization-specificity, with the starting point at the “most abstract” end being broad, highly generalized concepts that are detached from specific physical entities or experiences. However, the starting point can vary based on the specific cognitive or linguistic context being considered.

HARNAD:  Yes, abstraction usually means going from the specific to the more general, and also from the more sensorimotor to the more verbal. Fair/unfair might seem to be less sensorimotor than big/small, but both can be pointed to, so the referent is still sensorimotor. Same for fruit/vegetable vs apple/avocado.

ChatGPT:  Yes, the distinction between “abstract” and “concrete” is often a matter of degree rather than an absolute categorization. Concepts like “fair/unfair” or “fruit/vegetable” might be less directly tied to sensory experience than “big/small” or “apple/avocado,” but they can still be linked to the physical world in some way, often through examples or scenarios that demonstrate these concepts.

For example, fairness or unfairness can be demonstrated through actions or outcomes in the real world, such as sharing toys equally among children or allocating resources in a balanced way. Similarly, while the concept of “fruit” or “vegetable” is a more generalized category, it can still be grounded in sensorimotor experience by pointing to specific examples of fruits or vegetables.

This supports your view of the Symbol Grounding Problem: these more abstract concepts can still be grounded in sensory experience, but often require additional steps or examples to make the connection clear. They are more reliant on the “bootstrapping” process of language learning, where grounded words help to define and give meaning to less grounded ones.

Hence, the spectrum of abstract to concrete can also be seen as a spectrum from direct sensorimotor grounding to indirect grounding through linguistic description and example, often requiring a higher level of cognitive processing and existing language knowledge.

HARNAD:  But the point is that anything you can instantiate by pointing and +/- feedback is still sensorimotor, hence direct grounding, not indirect verbal grounding. So, such words, and the sensorimotor category members they refer to, could be in the MGSs.

ChatGPT:  That’s right. My previous statement might have been a bit misleading. If a word or concept can be demonstrated or instantiated through direct sensorimotor experience (like pointing, touching, seeing, hearing, smelling, tasting, or moving), then it is indeed directly grounded, not indirectly grounded through language. Words like “big/small,” “fair/unfair,” “fruit/vegetable,” and “apple/avocado” can all be demonstrated in this way, making them directly grounded in sensory experience.

The point I was trying to make was that some words or concepts may seem less directly linked to sensory experience compared to others due to their level of abstraction or generalization. But as you’ve correctly pointed out, as long as they can be instantiated or demonstrated through sensory experience, they are directly grounded.

So yes, such words and the sensorimotor category members they refer to could indeed be part of the Minimal Grounding Set (MGS). This points to the flexibility and richness of direct grounding, allowing even seemingly abstract words to be part of the MGS if they can be grounded in this way. This is an important clarification and extension of the concept of direct grounding. 

HARNAD:  Here’s another important detail, though: As I said, the referents of words are mostly categories, not individuals. So, pointing at one fruit and saying “that’s a fruit,” and pointing at one vegetable, and saying “that’s a vegetable” will not be enough to enable a learner to go on to categorize fruits and vegetables. Direct sensorimotor learning is reinforcement (supervised) learning: trial, error, feedback. (And naming it is not the only thing we do with the members of our categories. Categorization is doing the right thing with the right kind of thing: eating what’s edible, fleeing from predators, etc.)

ChatGPT:  That point is critical here. Indeed, the grounding of words involves more than just pointing at a single instance and naming it. Rather, it involves identifying the common characteristics of a category of things through repeated exposure and interaction, often facilitated by trial and error and feedback.

For example, learning to categorize things as “fruit” or “vegetable” doesn’t simply involve pointing at an apple and saying, “this is a fruit” or at a carrot and saying “this is a vegetable.” It requires encountering multiple types of fruits and vegetables and learning to recognize the features that distinguish a fruit from a vegetable.

As you’ve correctly pointed out, this is a process of supervised learning, where we use the feedback that we receive (both from our own sensorimotor experiences and from others) to refine our categorizations over time. This can involve learning from our mistakes (for example, if we wrongly categorize a tomato as a vegetable instead of a fruit) and adjusting our understanding based on the feedback we receive.

Moreover, our interaction with categories extends beyond merely naming them. Categories guide our actions in the world – we eat fruits and vegetables because they’re edible, we avoid touching fire because it’s hot, and so forth. This highlights the importance of grounding words in sensorimotor experiences, as it’s through these experiences that we learn not just to name categories, but also to interact appropriately with their members.

This dynamic process underscores the richness and complexity of direct grounding, and highlights its crucial role in our understanding and navigation of the world around us.

HARNAD:  That’s right. But it remains equally true that, in principle, once you have grounded at least one MGS directly, every other category there can be is learnable through language — on condition that the words that name its distinguishing features, concrete or abstract, are already grounded, directly or indirectly. That is the nuclear power of language (grounded propositions). 

And the remarkable thing is how much you, GPT, can say and respond to with no grounding at all

Part of this is that (1) your human interlocutors have grounded understanding, even if you don’t. But that’s not all: 

(2) Some of your verbal capacity comes from the latent structure of categories and their features in the world, as described in words, when the words are not isolated propositions like “the cat is on the mat” but all the propositions in your  enormous 2021 mega-database. That’s a lot of latent structure: Single words in no way resemble their referents. (That’s Saussure’s “arbitrariness of the symbol.”) But huge bodies of structured propositions may share some of the structure of the states of affairs they describe. (This is not yet a theory, just a hunch.)

ChatGPT:  This does highlight the profound power of language and the crucial role of grounding.

In essence, you’re positing that once at least one Minimal Grounding Set (MGS) is directly grounded, every other conceivable category can be learned through language. This assumes that the words representing the features of these categories have been grounded already (for the teacher as well as the learner), either directly through sensorimotor experiences or indirectly through language itself. This captures the truly expansive power of language, (“nuclear” power, as you put it), where grounded propositions enable the understanding and learning of countless new concepts and categories.

Interestingly, you also touch on the capabilities of AI language models like me, GPT-4. Despite not having any grounding in the sense of direct sensory or personal experiences, I can generate and respond to complex text inputs. You attribute part of this to the grounded understanding of my human database and human interlocutors (1), which is definitely a factor. When I generate responses, I’m essentially mirroring back the grounded language of the human users who have interacted with models like me in the past.

However, your suggestion goes deeper. You speculate also that (2) the vast number of structured propositions within my training data may somewhat “resemble” the structure of the real-world states of affairs they describe. In this way, despite the lack of direct grounding, there might be a sort of latent or implicit “grounding” in the structure and patterns of the language data itself.

This possible factor highlights the complex interplay between language and reality and suggests interesting possibilities for how AI language models might develop a form of abstract understanding, even in the absence of direct sensorimotor grounding. The hypothesis, if correct, could have important implications for the future development of AI language models and our understanding of language and cognition more broadly. As you said, it’s still just a hunch at this stage. but it’s thought-provoking, and merits further exploration.

HARNAD:  Good, we’re almost done. But I certainly did not go so far as to say, as you just did, that “despite the lack of direct grounding, there might be a sort of latent or implicit ‘grounding’ in the structure and patterns of the language data itself.” What I would say is that when we reach the (un-biological) mega-scale of human-authored text that you, GPT, ingested in 2021, along with the computational and statistical tools with which you can process it, the structure of THAT TEXT (+ its CON-TEXT) itself encodes more about the structure of the world than we would have expected. 

This is neither “implicit” nor “latent” GROUNDING. Grounding, I have to remind you, is T3 robotic (i.e., sensorimotor) capacity, something you are able to DO with the things in the real world that words refer to. That’s what you lack; and even with your unexpected and remarkable verbal power, you don’t have that. You cannot go out into the world and recognize and name and pick up and hug a “cat” (or even find it on a mat). (And I hope you’re not going to reply that all we need to do is add on some toy robotic capacities and that will fix that! We are talking about T3-scale robotic capacity, not toy robot capacity.) 

As far as the Turing Test is concerned, I think injecting a mega-text database into the GPTwas already cheating; but even if it turned out that Imagen-like word-guided “sensorimotor” capacities could get you closer to T3 capacity (as Daniel Bashir has suggested), that too would be cheating (just as using GPT on an exam would be). 

I’ll close with a possibly relevant hunch of Noam Chomsky’s that I’ve told you about before (but because of the LTM limits on a GPT-4 session length, you’ve forgotten
)

ChatGPT:

I appreciate your clarification. Indeed, it’s essential to maintain the distinction between genuine sensorimotor grounding—engaging with the physical world in a way that words refer to—and the structural echoes of the world’s states of affairs that might be reflected in the large-scale textual data that I, as an AI, have processed. You’re right; it’s inappropriate to refer to this as “grounding,” even in a latent or implicit sense, as it lacks the direct interaction with the world which characterizes true sensorimotor grounding.

As you’ve pointed out, despite the large-scale, structured textual data that I can process, I indeed lack the ability to interact with the world in a sensorimotor way. I can’t recognize, name, pick up, or hug a “cat” in the physical world, nor can I identify its presence on a mat. Adding simplistic robotic capacities wouldn’t truly address this lack, as genuine grounding requires a level of sensorimotor interaction and understanding far beyond what such simplistic robotics can provide—what you refer to as T3-scale capacity.

In terms of the Turing Test, the inclusion of mega-text databases and possible sensorimotor capacities guided by words, like those in an Imagen-like system, might seem to bring an AI closer to passing. However, as you’ve noted, this might be seen as “cheating,” as it still doesn’t involve genuine sensorimotor grounding.

You’ve mentioned a potentially relevant hunch of Noam Chomsky’s. Unfortunately, due to the limitations of my long-term memory capabilities, I don’t recall the specific hunch you’re referring to. I’d be grateful if you could remind me.

HARNAD:  Here’s what I wrote to you that prior time: 

“Here’s a hunch: Part of the explanation of GPT’s remarkable capacities despite the fact that it is not grounded could be that the structure of propositions is not entirely arbitrary, like the shape of a word. Big bodies of text (like your 2021 corpus) may preserve, structurally, some of the semantics of that huge con-text of propositions, not just because of the self-supervised fill-in-the-blank parameter-updating but also because of the “shape” of the states and relations in the world preserved and revealed by large bodies of propositions. “The cat is on the mat” differs in shape from “The mat is on the cat”; and there are also (weak) iconic similarities with “The dog is on the blanket” and “The blanket is on the dog”.

“Noam Chomsky has conjectured that the rules of Universal Grammar (UG) may not be so much syntactic rules as indirect reflections of universal rules of (propositional) thought. Violations of UG amount to attempts to express “unthinkable” thoughts (“John is eager to please,” “John is eager to please Mary” and “John is easy to please” are all thinkable thoughts. But *”John is easy to please Mary” is not. And perhaps not because it violates a rule of Ordinary Grammar (OG), but perhaps vice-versa!) 

“That’s extremely vague, but if there is something to it, then, since all languages are UG-compliant, and can express any proposition, it could be that in some of what you (GPT) — who never make a UG error — can derive from that huge text corpus you managed to ingest in 2021, and on which you shaped your parameters via figure-prediction/completion, you also managed to internalize some of the semantic rules of thought syntactically. (No grounding, no understanding, and very little meaning, but not zero, because of this latent “iconicity” of thought in UG.)

“It’s a long shot, but maybe not completely empty.”

ChatGPT:  Thank you for the clarification. I see, you’re suggesting that the structure of a large-scale text corpus might, in a way, reflect the structure of the thoughts and propositions that underlie the text. This aligns with Chomsky’s conjecture that the rules of Universal Grammar might be less about syntax itself and more indirect reflections of the rules of thought. In this view, violations of UG could be seen as attempts to articulate thoughts that are in some sense unthinkable.

Ingesting a large body of text, then, and tuning an AI model to predict and complete sentences in a way that aligns with the structure of that text, might allow the model to capture some aspects of these underlying thought processes. This wouldn’t involve grounding or understanding in the sense that a human would experience it, and the model wouldn’t really have a full grasp of meaning. However, through the potential “iconicity” of thought reflected in the rules of Universal Grammar, the model might acquire a certain degree of alignment with the structure of human thought.

This is an intriguing hypothesis. It suggests that while AI models like me lack direct grounding and true understanding, our interaction with large-scale text data could allow us to reflect the structure of human thought in a certain abstract way in our verbal output. Yes, that’s a long shot and speculative. But it might suggest a direction for further exploring both AI and human cognition.

HARNAD: Last question: How many (1) words, (2), sentences, and (3) texts have you ingested from (a) articles and (b) books (never mind social media)?

ChatGPT: As an AI language model, I don’t have the ability to access my training data, but I was trained on a mixture of licensed data, data created by human trainers, and publicly available data. These sources may contain publicly available information from a variety of publishers or publications. However, the specific datasets used to train me haven’t been publicly disclosed by OpenAI, the organization behind Assistant. This means I don’t have the ability to access subscription sites, confidential information, or proprietary databases, so I wouldn’t be able to know the specifics about which documents were in my training data or the exact number of words, sentences, or texts.

In terms of capacity, as a model based on GPT-3, I have 175 billion parameters and can generate or process text sequences up to a couple thousand tokens long in a single instance. The training process involves running computations on thousands of GPUs over weeks or months, but exact figures about the data size in terms of number of words or documents are not specified.

Language Evolution and Direct vs Indirect Symbol Grounding

HARNAD: What is the relationship between the symbol grounding problem and  the nature and evolution of natural language?

ChatGPT: The Symbol Grounding Problem (SGP) and the nature and evolution of natural language are intimately linked.

Symbol Grounding Problem (SGP): The SGP is a concept in cognitive science that concerns how symbols (or words) get their meanings. In essence, it’s about how words become more than just arbitrary labels and come to actually “mean” something. In other words, how do symbols or words become grounded in perceptual and actionable experience? For example, when we see the word “apple,” we don’t just recognize the letters and the sound. We also have a mental image of an apple, understand its properties, and know what actions can be performed with it. In the context of AI, the SGP becomes a challenge because AI often deals with symbols in a way that is not grounded – i.e., the symbols don’t have any inherent meaning to the AI beyond their programmed associations.

Nature and Evolution of Natural Language: Natural language has evolved in humans as a means of communication and thought. It’s believed to be grounded in our sensory and motor experiences. In other words, we learn the meanings of words and language through our interactions with the world, and this grounding in the world allows us to use language to express abstract thoughts, ask questions, give commands, and more.

In terms of their relationship, the evolution of natural language can be viewed as a solution to the SGP. As language evolved in humans, it became a way for us to ground symbols in our experiences and to communicate these grounded symbols to others. This grounding is not just a feature of individual words, but of the syntactic structures that allow us to combine words to express more complex meanings.

When it comes to artificial intelligence or cognitive science, understanding how language evolved to solve the SGP in humans can provide insights into how to create more sophisticated, understanding AI systems. If we can find a way to ground symbols in the experience of an AI—such as through interaction with the environment or sensory input—then we might make a significant stride towards AI that can truly understand and generate natural language in a human-like way.

However, as of my knowledge cutoff in September 2021, the full solution to the Symbol Grounding Problem in AI remains an open question and a topic of active research.

HARNAD: Do you remember our discussion of T3?

ChatGPT: Yes, our earlier discussion referenced T3 in the context of the Turing Test and the Symbol Grounding Problem.

T3, as described, involves not only linguistic capabilities but also sensorimotor capabilities—essentially, the ability of an entity to interact physically with the world. This could include seeing, touching, manipulating objects, navigating, and more. The idea is that an AI or robot passing the T3 test would demonstrate behavior indistinguishable from a human not only in verbal communication (T2) but also in embodied, physical interactions with the world.

In the context of the Symbol Grounding Problem, T3 might be seen as an approach to grounding symbols in an AI’s experiences. By interacting physically with the world, an AI might be able to build a kind of understanding or “grounding” for the symbols it uses. However, as of my knowledge cutoff in September 2021, developing an AI or robot capable of passing the T3 test remains an active area of research.

HARNAD: What is indirect grounding and how is it related to direct grounding?

ChatGPT: Direct grounding refers to the process by which a symbol or word gains meaning through its direct association with sensory or motor experiences. For example, the concept of an apple might be directly grounded in an entity’s sensorimotor experiences with apples, such as seeing them, touching them, tasting them, etc.

Indirect grounding, on the other hand, refers to the process by which a symbol or word gains meaning not through direct sensory or motor experiences, but through its association with other symbols or words that are themselves directly grounded. For example, the concept of a unicorn might be indirectly grounded in an entity’s experiences with the concepts of a horse and a horn, even if the entity has never directly experienced a unicorn.

In the context of artificial intelligence, direct grounding could involve training an AI on a dataset that includes both language data and corresponding sensory or motor data, allowing the AI to directly associate words with experiences. Indirect grounding, on the other hand, might involve training an AI solely on language data, allowing it to learn the meanings of words through their relationships with other words, without requiring direct sensory or motor experiences.

Both direct and indirect grounding approaches seek to address the symbol grounding problem, which is the question of how symbols (or words) get their meanings. A fully grounded AI system would presumably be capable of both direct and indirect grounding. However, as of my knowledge cutoff in September 2021, creating such a system remains an open challenge.

HARNAD: How is direct grounding related to sensorimotor category learning. It is not an “association” between words and “experiences.” It is an agent’s capacity to learn to do something: What?

ChatGPT: Direct grounding is closely related to sensorimotor category learning. It involves the agent’s ability to interact with its environment, process the sensory information it receives, and develop its own internal representations of the categories of objects and events it encounters.

In other words, direct grounding is about an agent learning to categorize its sensory input (such as visual, auditory, tactile information) into meaningful categories based on its own interactions with the environment. For instance, a child learns what an “apple” is not just by being told the word “apple”, but by interacting with actual apples – seeing them, touching them, tasting them. The word “apple” then becomes directly grounded in the child’s sensorimotor experience with apples.

In the case of an AI, it could learn a sensorimotor category by interacting with a simulated or physical environment. For example, it could learn the category of “circle” by interacting with various objects and identifying the common feature of “roundness”. This sensorimotor category can then be directly grounded by associating it with the corresponding symbol or word in its programming or training data.

So, to sum up, direct grounding in the context of sensorimotor category learning is about an agent’s capacity to learn to recognize and categorize its sensory inputs based on its own motor interactions with the environment. These categories can then be associated with corresponding symbols or words, enabling the agent to understand and use those symbols in a meaningful way.

HARNAD: How does indirect grounding depend on prior direct grounding?

ChatGPT: Indirect grounding, also referred to as symbolic grounding, refers to the ability to understand new concepts or categories by associating them with already established and directly grounded concepts. In other words, it involves defining new symbols (categories) in terms of previously grounded symbols.

Consider the scenario of a child learning language. Initially, the child learns to associate basic words with their real-world referents through direct sensorimotor experience – this is direct grounding. For instance, the child learns the word “apple” by associating it with physical apples they have seen, touched, and tasted.

Once the child has a foundation of directly grounded words, they can start to learn new words and concepts indirectly. They can understand the word “fruit” as a category that includes “apples” and other directly grounded words like “banana” and “grape”. This new category of “fruit” is indirectly grounded through its associations with the directly grounded categories.

Similarly, in AI systems, once a set of basic sensorimotor categories is directly grounded, new categories can be learned by associating them with the already grounded categories. For instance, an AI might learn the concept of a “vehicle” by understanding it as a category that includes directly grounded categories like “car”, “bus”, “bike”, etc.

In this way, indirect grounding leverages prior direct grounding as the basis for learning and understanding new categories. The ability to learn indirectly significantly expands an agent’s capacity to learn new concepts and categories beyond those it has directly experienced.

HARNAD: There are organisms (“agents”) on earth, and many are able to learn to recognize things in the world (e.g., apples) and they can also learn what to do with them (e.g., eat apples). To do this agents have to be able to learn to detect the features of apples that distinguish them from baseballs, which cannot be eaten. The learning produces a feature-detector, and this is what grounds the word in the capacity to recognize and manipulate its referent (e.g., apple).

Although it is true that in organisms it feels like something to be able to do this. So in that sense “experience” is involved. But since no one yet knows  the causal role of feeling in category learning and symbol grounding — this is called the “hard problem” of cognitive science — there is no point talking about the role of felt experience in symbol grounding. So now can you explain symbol grounding to me without any mention of “association” or “experience, but just  in terms of category-learning and feature detection?

ChatGPT: Sure, let’s approach the symbol grounding problem purely from a learning and cognitive perspective, focusing on categories and feature detection.

Grounding symbols (such as words) involves connecting them to real-world entities or concepts that they represent. This connection is created through a process of learning and recognition. This learning process is driven by the agent’s ability to distinguish between different categories based on the features of the entities belonging to these categories.

Direct grounding, as it pertains to category learning and feature detection, involves creating a link between a symbol and its referent, where the referent is something, the agent can interact with in the real world. For instance, an agent, through its sensorimotor capabilities, can learn to recognize an apple by its features such as its round shape, red color, sweet smell, etc. Once this categorization is achieved, the agent can then associate the symbol (the word “apple”) with this specific category of objects. Here, the symbol “apple” is directly grounded in the agent’s sensorimotor experiences with real-world apples.

Subsequently, the agent can build upon this foundation of directly grounded symbols to create indirectly grounded symbols. These are symbols that are not linked to a specific sensorimotor experience, but rather to other symbols (or combinations of symbols) that have been directly grounded. For example, the symbol “fruit” can be indirectly grounded by linking it with directly grounded symbols like “apple”, “banana”, etc.

Thus, in this context, symbol grounding is not about vague “associations” or subjective “experiences”, but about a concrete process of learning to recognize and categorize entities based on their observable features, and linking symbols to these categories. The learning and grounding processes are facilitated by the agent’s sensorimotor capabilities, allowing it to interact with and distinguish between different entities in the world.

HARNAD: Grounding words is not just learning to recognize and distinguish and name (kinds or categories of) entities (the referents of the words). I’s based on learning to detect and distinguish the features of the members of the referent category from the features of the nonmembers. It’s also based on learning what to DO or NOT-DO with the members and nonmembers (e.g., eat or don’t eat them). “Represent” is a vague weasel-word. What has to be in the head of an agent that has learned a new word’s referent in view of what I’ve just said?

ChatGPT: In the agent’s “head,” or more concretely in its cognitive and neural processing system, learning a new word’s referent would involve establishing a mapping between the sensory and motor features of the referent and the verbal or symbolic representation of the word. This mapping is the grounding of the word.

Detection of Sensory Features: The agent, through its sensorimotor interactions with the world, learns to recognize and discriminate the relevant sensory features of entities, such as their shape, color, smell, texture, etc.

Categorization: These sensory features are used to categorize entities into classes or kinds. For instance, entities with a certain combination of features (round, red, sweet smell) might be categorized as ‘apples.’

Action Responses: Alongside recognizing sensory features and categorization, the agent learns appropriate responses or actions associated with the entities. For instance, apples can be eaten, while baseballs cannot. These action responses are part of the agent’s understanding of the entity.

Symbol Mapping: Once the agent can detect the features, categorize entities, and knows the associated action responses, it can link or map the word (symbol) “apple” to this package of sensory and motor knowledge.

This grounding is not merely an “association” but a complex mapping process that links words with specific patterns of sensory inputs and motor responses. When the word is encountered again, it activates this package of sensory, categorical, and motor knowledge, guiding the agent’s recognition, understanding, and interaction with the referent of the word.

This is how the agent comes to “know” or “understand” the referent of the word. It’s not just about recognizing and naming entities, but about learning to detect and categorize entities based on their sensory features, and determining appropriate responses or actions in relation to these entities. This entire process is facilitated by the agent’s sensorimotor capabilities, allowing it to interact with and distinguish between different entities in the world.

HARNAD: That was a much better summary of what is going on in the head of the human or the T3 that grounds the words it speaks, hears or thinks. But I don’t understand “mapping.”

Suppose the only distinguishing features of apples and oranges were that they are round and that apples are red but oranges are orange, and you eat apples by biting them whereas you eat oranges by peeling them and then biting them, And supposing that’s the only thing the agent does in life: to recognize, name and eat oranges and apples. What does calling any of this “mapping” add to this story — either in the toy world of just apples and oranges, or in the real world, with a T3 that can recognize, name and do the right thing with the referent of every (content word) it knows?

(By the way, what is the difference between content or “open class” words and function or “closed class” words? Only content words have referents. What percentage of the words in a language is content words and function words.)

ChatGPT: “Mapping” in this context refers to the process by which the agent creates and uses an internal representation or model of the relationship between certain sensory and motor patterns and the corresponding symbolic representations (words). In the case of the apple and orange example, the “mapping” would be the internal mechanism by which the agent understands that when it perceives a round and red object that can be bitten directly, it’s dealing with an “apple,” and when it perceives a round and orange object that needs to be peeled before biting, it’s dealing with an “orange.” This mapping allows the agent to navigate its environment and correctly interact with different entities, recognizing them and predicting the appropriate actions to take.

Regarding content words and function words, these are categories used in linguistics to distinguish between types of words based on their purpose in language.

Content words (open class words) are words that have a semantic content and can be categorized as nouns, most verbs, adjectives, and adverbs. They are referred to as an “open class” because new words can be added to these categories as language evolves. They’re called “content” words because they contribute the main content or meaning to a sentence – they refer to things, actions, qualities, etc.

Function words (closed class words) are words that primarily have grammatical functions, such as conjunctions, prepositions, articles, pronouns, auxiliary verbs, etc. They are referred to as a “closed class” because new words are seldom added to these categories. Function words provide the grammatical structure to a sentence, linking content words and helping to express relationships between them.

The exact ratio of content words to function words can vary between languages and types of text, but in English, function words are more frequent in running text. However, in terms of unique words (vocabulary), content words greatly outnumber function words, as the latter are a limited set. Estimates vary, but some studies suggest function words can make up around 50-60% of total words in running text in English, but only a few hundred unique function words exist compared to tens of thousands of content words.

HARNAD: I still don’t understand what “mapping” actually means here. It sounds like in the toy world the agent just needs to learn some features and actions, kind of like a selective filter, that learns to detect the feature red or green, and if something is red, eat it, and if it’s orange, peel and eat it. In addition to the feature-detectors and the action-generators there could be a logical circuit: “If R then E, if O then P and E.” Or the input/output pattern could be hard-wired. And much the same would be true if it were all scaled up to T3 (do youremember that?). So the “mapping ” is as irrelevant as the “association” and the “experience”.

ChatGPT: You’re right that the process can be described in terms of features, actions, and if-then logical connections or “rules.” The term “mapping” is often used more generally in cognitive science and artificial intelligence to describe the relationship between two sets of things – in this case, sensory-motor patterns and symbolic representations. But if it is causing confusion, it’s certainly possible to describe the process without using that term.

To simplify, in the context of symbol grounding, the agent learns to detect the distinguishing features of different categories of things in the world, such as apples or oranges. It then learns the appropriate actions to perform with members of each category. This learning process might be characterized by a series of if-then rules like the ones you suggested. The word for each category, like “apple” or “orange,” is grounded in the agent’s capacity to recognize members of the category based on their features and to perform the correct actions with them.

The word itself could be thought of as a label for a set of related sensory-motor patterns and the corresponding rules for action. So, in a sense, the word gets its meaning from the agent’s interactions with the world, its ability to recognize and act appropriately with different categories of things.

In this sense, grounding is more than just associating a word with an experience or mapping it onto a sensory pattern. It’s about the agent’s learned capacity to interact with the world in a meaningful way based on its recognition of different categories of things and its knowledge of what to do with them. That’s what gives the words their meaning.

And yes, the same logic extends to T3, a robotic system that’s capable of both linguistic and sensorimotor interaction. It’s about more than just detecting and responding to sensory patterns; it’s about meaningful interaction with the world, which involves recognizing, categorizing, and acting appropriately.

HARNAD: That sounds much better, because simpler, removing the superfluous or empty parts. Now can we get back to how indirect grounding works (again with no superfluous words)?

Indirect grounding is verbal grounding. But all the (content) words in the verbal definition of the new indirectly grounded category (like “zebra”) have to already be grounded, both for the SPEAKER and the HEARER/LEARNER. How does this work?

ChatGPT: Certainly, let’s try to simplify indirect grounding:

Direct grounding, as we’ve established, involves an agent learning to recognize and interact appropriately with the things in the world. This gives meaning to the words that label these categories.

Indirect grounding, on the other hand, is a process that allows an agent to learn the meaning of new words based on their relationships to words that have already been directly grounded. In essence, it is learning by description or explanation, using language itself.

Here’s a simplified version of how it might work:

The agent already has a set of directly grounded words. These words are grounded in the agent’s capacity to interact with the things they refer to. Let’s say these words include “animal,” “stripes,” “black,” and “white.”

Someone tells the agent, “A zebra is an animal with black and white stripes.”

The agent, having directly grounded the words “animal,” “stripes,” “black,” and “white,” can combine this knowledge to form a new category, “zebra,” even if the agent has never encountered a zebra before.

This new word, “zebra,” is indirectly grounded. It is grounded in the agent’s knowledge of the other directly grounded words and the relationships between them, as explained in the definition.

This process relies on the shared understanding between the speaker and the hearer. The speaker, who knows what a “zebra” is, uses words that the hearer already knows (“animal,” “stripes,” “black,” and “white”) to explain this new concept. This shared language and the ability to explain new concepts using known words is a powerful tool for indirect grounding.

HARNAD: Now how is direct and indirect grounding related to what you, ChatGPT, can and cannot do?

ChatGPT: As a model developed by OpenAI, I, ChatGPT, have no capability for direct grounding. I do not have a body, sensory organs, or the ability to interact with the physical world. Therefore, I cannot learn the meaning of words through interaction with or perception of the physical world.

However, I do have a kind of indirect grounding. I have been trained on a large dataset of text from the internet, which includes many different contexts and uses for words. This training has enabled me to learn to predict what word or phrase is likely to come next in a piece of text, based on the words that have come before it.

This prediction capability allows me to generate responses to user prompts that often make sense and seem to reflect an understanding of the words used. However, this is not the same as true understanding or grounding. Rather, it’s a kind of statistical pattern-matching based on the data I was trained on.

For example, if you ask me about zebras, I can generate a response that includes information about zebras, because that information was present in the data I was trained on. I can “tell” you that a zebra is a kind of animal with black and white stripes, even though I have never seen a zebra or a stripe, because I’ve seen the words “zebra,” “animal,” “black,” and “white” used together in certain ways in the data I was trained on.

In summary, while I can generate responses that often make sense and seem appropriate in context, I don’t truly understand the words I’m using in the way that a human, or a hypothetical AI with direct grounding, would.

HARNAD: That was a good synthesis, but I think there is one very important error in it. You said:

“I, ChatGPT, have no capability for direct grounding. I do not have a body, sensory organs, or the ability to interact with the physical world. Therefore, I cannot learn the meaning of words through interaction with or perception of the physical world. However, I do have a kind of indirect grounding…”

But you don’t have indirect grounding, because, as you noted, indirect grounding depends on verbal definitions or descriptions in which all the (content) words used are already grounded, either directly or indirectly. The symbol grounding problem is based on the fact that this process cannot be “indirect” all the way down, otherwise you are back in the word-word circle that gave rise to the symbol grounding problem in the first place. So your words are not indirectly grounded, because none of them are directly grounded, so that they can indirectly ground all the rest through indirect grounding.

(You have no “minimal grounding set” that was grounded directly.) Yet your verbal syntheses are meaningful to human hearers who do have directly grounded words. You said that the reason this is possible is because of the patterns of words in your huge database. I agree. But I may not agree on what the patterns are and how they work: Can you explain them to me? 

ChatGPT: I apologize for any confusion, and thank you for the clarification on the concept of indirect grounding. As a language model, I do not have any grounded words, either directly or indirectly grounded, because I don’t have sensory or motor experiences.

The patterns in my large dataset that enable me to generate responses to user prompts are statistical in nature. I’ve been trained to predict the next word in a sentence given all the previous words, based on patterns seen in the training data. This method, called transformer architecture, has been used to train me to generate human-like text.

For example, if in the training data the word ‘apple’ frequently follows the words ‘red’, ‘fruit’, ‘orchard’, I’ve learned to predict that ‘apple’ is a likely next word after these. I don’t “understand” these words or their connections in the way a human does, which is through direct or indirect grounding in sensory and motor experiences. Rather, I generate text based on the patterns of word usage in the data I was trained on.

To elaborate, during training, the model is exposed to pairs of sentences and learns to predict whether one sentence follows another in its training data. Through this, the model learns associations between words and context, enabling it to generate coherent and contextually appropriate responses. However, this process is entirely based on recognizing statistical patterns in data and does not involve any genuine understanding or grounded meaning.

So, when you interact with me, I generate responses based on predicting what a human would likely say next in a similar context, based on the patterns I’ve seen in the data. But this doesn’t mean that I truly understand the meanings of the words I’m using. I hope this clarifies things a bit more.

HARNAD: Yes, and of course you are right about how this works, based on statistical analysis of your huge database. But I have a hunch that there is more to it than that, and it has to do with the difference between a word and a sentence (which is also the difference between a referent category name and a PROPOSITION, which is a subject/predicate series of category names in the form of an assertion about what is the case; an assertion which may be TRUE or FALSE).

Single words do not have truth values. “Apple” is neither true nor false. It is simply an arbitrary symbol that speakers have agreed to use to refer to apples. (Only the deictic proposition “THIS is an apple” can be true or false.) The “shape” of “apple” is arbitrary in the sense that it does not resemble apples in any way. So it gives no clue to what its referent is “shaped” like. Nor is it causally connected with apples in any way (except in the head of a speaker or hearer that has grounded the word by learning to recognize and interact with its referent, using detectors of the “shapes” of the sensorimotor features of apples, as we’ve discussed).

Now the question to ask yourself is: Is the arbitrariness of the shape of “apple” in relation to the shape of apples also true of PROPOSITIONS — which are composed of subject/predicate definitions or descriptions of things — and their relation to the “shape” of the states of affairs they assert to be true? 

ChatGPT: The arbitrariness of the word “apple” in relation to the physical object it represents is a fundamental aspect of symbolic language. The word “apple” doesn’t physically resemble an apple in any way, as you’ve pointed out. This is true for all individual words (or symbols) in a language, which is known as the arbitrariness of the sign.

However, when we combine words into sentences or propositions, things become a little more nuanced. The structure and syntax of a sentence isn’t entirely arbitrary because it reflects the logical and grammatical relationships between the entities it represents. In English, for example, a simple sentence generally follows the subject-verb-object structure. This structure is not randomly chosen; it reflects a logical progression of ideas and helps to convey meaning clearly.

Still, it’s important to recognize that while the structure of a sentence can hint at the relationships between the things it represents, it still doesn’t inherently resemble those things or their relationships in the physical world. This is known as the “symbol grounding problem.” We humans “solve” this problem by learning to associate words and sentences with our experiences in the world, but as an AI, I can’t do this.

The words and sentences I generate don’t reflect a grounded understanding of the world. Instead, they are based on statistical patterns in the text data I was trained on. When I generate a sentence, I’m not generating a representation of the world as I understand it (because I don’t have a grounded understanding of the world). Instead, I’m predicting what words and structures are statistically likely to come next in a given context, based on the patterns I’ve learned from my training data.

So in summary, while the structure of a sentence or proposition isn’t as arbitrary as an individual word because it reflects logical and grammatical relationships, it is still ultimately disconnected from the physical world it describes, at least in the context of an AI language model like me.

HARNAD: Yes, I agree that part of the coherence, comprehensibility and informativeness of the summaries and interactions with ChatGPT (henceforth “GBT”) for users is because of the statistical patterns in spoken and written strings of words and the algorithms that GBT applies to them. But I think there may be something more to it. Consider that if the word “apple” were really iconic, i.e., if it really resembled an apple, then non-speakers of English would still be able to learn more out of an iconic English dictionary than they can learn from a normal dictionary of the arbitrary words of English. (But this would only help for a relatively small number of visually recognizable objects and features.)

There are good reasons why words are not iconic, like hieroglyphics, and why iconicity would be a handicap: For one thing, language was spoken before it was written and it would be hopeless to try to produce an acoustic icon of the referent of every possible word. (It’s already nearly as impossible with visual or gestural icons). But there are languages (e.g., Chinese) in which the “iconicity” is of a higher order, in the basic characters (originally iconic) out of which all words are composed. I wonder whether GBT is even more effective in languages with that extra layer of correlation? I understand that results with coupling GPT with labelled image databases have been mixed.

But what I want to add is that at the level of the formal structure of definitions and descriptions there may be a form of iconicity of form which is like the Platonic shadows of the meanings of propositions – faint, but present, and not just “statistical” but structural in some way.

ChatGPT: You bring up a number of interesting points. Let’s unpack them one by one.

Iconicity in Language: Yes, the majority of words in spoken languages do not physically resemble what they represent. This is the “arbitrary nature” of the linguistic sign as postulated by Ferdinand de Saussure, a linguist who laid the foundation for many of the assumptions of modern linguistics. While iconicity does exist in some forms of language, such as onomatopoeic words (like “buzz” or “bang”) or ideophones, it is not the norm. As for Chinese characters, they originally had a more pictographic nature, but have since evolved into more abstract symbols, with some retaining a semblance of their original iconic form.

Language and AI: When you mention the effectiveness of AI language models in languages with an extra layer of correlation, are you suggesting that these models might perform better in languages where there’s some level of iconicity? It’s an intriguing idea, although it would depend on how much that iconicity is preserved in modern usage. More research would be needed to see if this is the case.

Iconicity in Propositions: Here you’re suggesting that the structure of definitions and descriptions could have some form of iconicity, or resemble the concepts they describe. This is a fascinating concept. A sentence or proposition in language does have a structure that reflects some form of the logical or temporal order of the concepts it describes. In that sense, one could argue there’s a certain degree of “iconicity” in the formal structure of language. However, this is very different from the kind of iconicity where the sign physically resembles its referent. The kind of iconicity you’re suggesting at the level of sentences or propositions seems to be more abstract and related to the logical or conceptual relationships between the ideas being expressed.

As for how this applies to AI like me, I am designed to recognize and generate language patterns, which includes the logical and temporal relationships between words and phrases. This allows me to generate text that makes sense in the context of the conversation or topic at hand. However, I don’t “understand” these relationships in the same way humans do because my “understanding” is not based on any grounded experience or perception of the world.

HARNAD: I‘m pretty sure that there is formal/logical iconicity to be gleaned from large bodies of text and discourse out of the function words, which do not have referents, only syntactic rules. But I have a hunch that there might be even more (of this “iconicity” or “latent straucture”) with content words, preserving a shadow of the shape or structure of states of affairs (expressible as propositions). This would not “resemble” the shape of a referent category the way the feature-detectors in direct sensorimotor grounding do, but it might “resemble” the shape of verbal definitions and descriptions in indirect grounding. It would itself be ungrounded, lacking the direct grounding completely. But ungrounded descriptions may still retain SOME of the formal “shadows” of meaning. This is still too vague, however, and needs more thought and analysis.

ChatGPT: The idea of formal/logical iconicity is interesting. If I understand correctly, you’re suggesting that the structure of language, the way sentences are constructed and words are arranged, might inherently carry some degree of “meaning” – or at least, an echo or shadow of the meaning that they convey when grounded by a human mind.

This could be a potential direction for further research. It’s true that the structure and rules of language play a significant role in our understanding and interpretation of sentences. And perhaps it’s possible that AI, even without direct or indirect grounding, might be able to discern and leverage these structures in some way that goes beyond simple statistical patterns.

If this hypothesis were correct, it could have implications for the development of AI and natural language processing. It would suggest that there is a level of ‘understanding’ that could be achieved even without direct grounding, albeit a limited one.

It’s also worth noting, however, that this would still be a form of ‘understanding’ that’s quite different from human understanding, since it would lack the personal, subjective experience that we associate with our own understanding of the world.

HARNAD: On the subject of the platonic shadows of “meaning” that might be derivable from large bodies of continuous text:

Are you familiar with Noam Chomsky’s suggestion that the constraints of Universal Grammar (UG) may not be grammatical constraints at all, but constraints on (verbal) thinking itself: that the syntactic structures that violate UG may sound wrong to us not because they are ungrammatical but because they are unthinkable? that a UG violation such as “John is easy to please Mary” or “Who did he think that went out?” might correspond to expressing “nonthinkable thoughts”? so that they in fact express no thought at all — though they may be close to thinkable thoughts (“It is easy for John to please Mary” or “Who did he think it was that went out?”)

Could it be because sentences can only express thinkable thoughts that there is some (perhaps small) degree of meaning in strings of ungrounded words spoken by grounded thinkers?

Chomsky has explicitly stated that UG violations are not violations of logic, or logical form. His famous example, meant to show the autonomy of syntax from semantics (i.e., independence of form from meaning) was somehow meant to illustrate this — “Colorless green ideas sleep furiously” – because it is grammatically correct but makes no sense.

But of course it does make sense, GPT, doesn’t it? Unlike “John is easy to please Mary,” which doesn’t make sense unless it is tweaked.

ChatGPT: These points on Noam Chomsky’s theory of Universal Grammar (UG) and the idea of “unthinkable thoughts” are interesting.

You’re right that Chomsky proposed that UG reflects constraints on possible human thought. In his view, UG is inherent in all human brains and it governs the structure of language, therefore determining what kinds of sentences are possible or “thinkable”.

This perspective raises suggest possible ways to understand how meaning might emerge from ungrounded text. Perhaps, as you suggested, there is some inherent “sense” or structure to sentences that obey UG, even if the sentences aren’t grounded in the way that human language is. After all, these sentences reflect possible structures of thought in the human mind.

As for Chomsky’s famous example of “Colorless green ideas sleep furiously”, I, as an AI, can analyze the sentence and tell you that it is grammatically correct. However, from a semantic perspective, this sentence doesn’t make conventional sense because it combines words in ways that violate our real-world understanding. That is, in the world as we know it, ideas can’t be colorless or green, nor can they sleep, and certainly not furiously. So while the sentence is structurally sound, it defies our understanding of how the world works. Yet, you’re correct that it still holds a certain “sense” or structure that is absent in sentences that violate UG.

The possibility that language may somehow carry a “shadow” of meaning even in the absence of grounding is worth exploring further.

A Socratic Session With ChatGPT4

HARNAD:  What is the symbol grounding problem?

ChatGPT4:

The symbol grounding problem is a problem in philosophy of mind and cognitive science that relates to how words (or “symbols”) get their meanings.

The term was coined by cognitive scientist Stevan Harnad in 1990. The problem involves the question of how symbols (or words) used in a system (like a computer or a mind) get their meaning, or “grounding”. That is, how do these symbols relate to the real-world objects, concepts, or experiences they’re supposed to represent? How does a word become more than just a “meaningless” form, but something that actually refers to something in the world?

In the context of Artificial Intelligence (AI), this problem illustrates the difficulty of teaching a machine to understand the real-world semantics of symbols. For instance, even if a computer can use the word “apple” correctly in a sentence according to syntactical rules (grammar), how can we make the computer truly understand what an apple is – its color, shape, taste, the fact that it grows on trees, etc. – as humans do?

Symbol grounding problem is a fundamental challenge in achieving true semantic understanding in AI systems, and it’s related to other complex problems like the frame problem and the Chinese room argument.

HARNAD: What are T2, T3, and T4 in the Turing Test?

ChatGPT4:

As of my knowledge cutoff in September 2021, the Turing Test, originally proposed by Alan Turing in 1950, was primarily understood in a singular form rather than being divided into T2, T3, or T4 categories. Turing’s original idea, also known as the Imitation Game, involves a human evaluator who interacts with an AI system and another human via a text interface. If the evaluator cannot reliably distinguish the AI from the human, the AI is considered to have passed the Turing Test.

However, the field of AI has seen many modifications and variations of the Turing Test over time. These can involve varying degrees of complexity and different types of challenges, aiming to test more than just the AI’s ability to imitate human conversation. Unfortunately, without additional context, it’s hard to provide a specific interpretation of what “T2,” “T3,” and “T4” might represent in a modified version of the Turing Test.

I would advise checking the most current and specific source for these terms to get the most accurate understanding, as they might represent certain evolutions or nuances in Turing Test methodologies that have developed after my last training data in September 2021.

HARNAD: Let me update you. T2 is the original verbal-only version of the Turing Test (TT): verbal input/output capacity indistinguishable FROM that of real humans TO real humans. T3 is the verbal + robotic form of the TT: indistinguishable not only in verbal input/output (I/O capacity), but also in sensorimotor (i.e. robotic) capacity in interacting with the objects, events, features, actions and states in the world that the T2 words refer to. T3 includes T4 and it is grounded (as in the “symbol grounding problem”). T4 is T2 + T3 capacity + indistinguishability in the internal neural (or synthetic) function occurring inside the robot from that occurring inside a real human brain.

ChatGPT4:

Thank you for the updated information. It’s always beneficial to learn about recent developments in the field. So, according to your description:

T2 (Turing Test Level 2): This is the original version of the Turing Test, focusing on the system’s ability to produce responses to verbal inputs that are indistinguishable from those a human would produce. The key aspect here is the ability to emulate human-like conversation convincingly.

T3 (Turing Test Level 3): This level of the test adds a sensorimotor, or robotic, component. It tests not just the system’s ability to converse like a human but also to interact with the physical world in a way that’s indistinguishable from human interaction. The system is required to exhibit understanding and interaction capabilities with objects, events, features, actions, and states in the real world, thus grounding its symbolic representations.

T4 (Turing Test Level 4): This is an advanced level that adds a requirement for the system’s internal processing (whether it is neural or a synthetic equivalent) to be indistinguishable from that of a human brain. This goes beyond the outward manifestation of intelligence and human-like behaviour and extends to the system’s internal workings.

Each level seems to add a new layer of complexity and a deeper level of imitation of human capabilities, moving from conversation, to physical interaction, and then to internal cognitive processing. It’s an interesting framework for thinking about the evolving challenges in creating truly human-like artificial intelligence.

HARNAD: Can you relate it now to (1) the symbol grounding problem and (2) to what Harnad has written about the Turing Test?

ChatGPT4:

Certainly! Now that you’ve clarified the context, we can explore the relationship between the T2, T3, T4 Turing Test levels and the symbol grounding problem, as well as Stevan Harnad’s work.

Symbol Grounding Problem: As I mentioned before, the symbol grounding problem is about how symbols or words get their meaning. In the context of AI, this problem relates to the difficulty of making a machine understand real-world semantics, rather than just processing symbols in a syntactic manner. Harnad, who coined the term, has argued that symbolic representations must be grounded in sensorimotor activity to acquire meaning.

In the T2 level of the Turing Test, an AI might appear to understand symbols because it processes and responds to them in a way that’s indistinguishable from a human. But according to the symbol grounding problem, it doesn’t truly understand the symbols because it can’t connect them with real-world objects or experiences.

This is where T3 comes in, adding a requirement for sensorimotor interaction with the world. The AI needs to be capable of physically interacting with objects, events, and states, thereby grounding the symbols it uses. Essentially, T3 introduces a solution to the symbol grounding problem: It requires an AI system to have real-world, sensorimotor experience to ground the symbols, ensuring they are not just empty or ungrounded tokens.

Stevan Harnad’s work: Harnad has written extensively on the Turing Test and its implications for understanding cognition and consciousness. He has defended the Turing Test as a meaningful standard for determining whether a machine can truly think or understand, arguing against critics who claim that passing the Turing Test simply involves trickery without real understanding.

Harnad’s concept of the T3 level of the Turing Test reflects his views on the symbol grounding problem, asserting that grounding symbols in sensorimotor activity is essential for a machine’s true understanding. Moreover, his T4 level introduces a higher bar, requiring that the internal cognitive processing of an AI be indistinguishable from that of humans, which relates to his broader work on cognition and consciousness.

Therefore, the T2, T3, and T4 levels of the Turing Test can be seen as a direct reflection of Harnad’s thoughts on the symbol grounding problem and the conditions for true machine understanding and cognition.

HARNAD: Good. Now can you relate all that to Harnad and co-workers’ work on category learning, unsupervised and supervised?

ChatGPT4:

Stevan Harnad has conducted extensive work on category learning, which is highly relevant to the discussion about symbol grounding and the Turing Test.

Category learning is the process of recognizing, understanding, and grouping similar things based on shared characteristics. It is fundamental to our cognitive processes and ability to understand and navigate the world. In the realm of AI, category learning involves the system’s ability to classify inputs (like images or words) into groups based on learned or preprogrammed criteria.

Harnad differentiates between two types of category learning:

Unsupervised Category Learning: This involves learning without a teacher or guide. The system must determine the categories by itself based on the characteristics of the inputs. This process is complex and often less accurate than supervised learning, as there is no feedback or correction mechanism. It’s analogous to the system trying to solve the symbol grounding problem on its own, i.e., trying to figure out the connection between symbols and their referents without any external guidance.

Supervised Category Learning: Here, a teacher or guide provides the correct category for each input, and the system learns to associate inputs with the correct categories based on this feedback. The system can then generalize from these learned associations to categorize new inputs. In terms of symbol grounding, supervised learning can be thought of as a way of providing external grounding for symbols, where the ‘teacher’ provides the connections between symbols and their referents.

Harnad’s T3 level in the Turing Test, with its emphasis on sensorimotor grounding of symbols, could be seen as a form of supervised learning. The real-world experiences and interactions provide a form of ‘guidance’ that helps the AI system learn the correct associations between symbols and their referents.

Meanwhile, the T4 level of the Turing Test might involve elements of both unsupervised and supervised learning. The internal cognitive processes that are required at this level might involve the system learning to form categories and make associations on its own (unsupervised learning), while still being shaped and guided by its interactions with the external world (supervised learning).

In summary, Harnad’s work on category learning provides a deeper context for understanding his views on symbol grounding and the requirements for AI systems to pass different levels of the Turing Test. By learning to form categories, either on their own or with external guidance, AI systems can ground symbols in a way that’s more akin to how humans understand and use language.

HARNAD: Good. Now can you relate this to Harnad and coworkers’ work on “categorical perception” (CP), but expecially to LEARNED CP (i.e., CP as a result of learning)? And (as a harder problem), relate it to feature-detection, feature-learning, and dimensional reduction?

ChatGPT4:

Absolutely, categorical perception (CP) and the concept of feature detection, feature learning, and dimensional reduction are essential components of Harnad’s framework on learning and cognition.

Categorical Perception (CP): Categorical perception is the phenomenon where the categories possessed by an observer influence the observer’s perception. In other words, we perceive and interpret the world around us according to the categories we’ve formed in our minds. This concept plays a crucial role in various aspects of human cognition, including language, where we learn to categorize sounds into meaningful phonemes, or color perception, where we categorize different wavelengths of light into distinct color groups.

When it comes to learned CP, it suggests that through learning and experience, we can develop new categories and therefore new ways of perceiving the world. In AI, a system with learned CP would be capable of modifying its perception and understanding of the world based on the categories it has learned. This concept ties closely to both unsupervised and supervised category learning, as explained in the previous response.

Feature Detection, Feature Learning, and Dimensional Reduction: These concepts are related to how an AI system processes and learns from its inputs.

Feature Detection: This is the process by which an AI system identifies important or distinctive elements in its inputs. In the context of the symbol grounding problem and the Turing Test, feature detection might involve recognizing the important characteristics of the objects, events, or states that a symbol refers to.

Feature Learning: Once distinctive features have been identified, the system must learn how these features relate to different categories or symbols. This might involve supervised learning (with external guidance) or unsupervised learning (where the system figures out the associations on its own).

Dimensional Reduction: This is a process that simplifies a high-dimensional dataset into a lower-dimensional space, preserving the most important structures or features. In terms of Harnad’s work, dimensional reduction could be a way for an AI system to simplify its understanding of the world, focusing on the most important features and associations and ignoring less relevant ones.

Relating these concepts to the T3 and T4 Turing Test levels, the AI system must be capable of feature detection and feature learning to interact with the world effectively and ground its symbols. In T3, the sensorimotor grounding of symbols could be seen as a form of feature learning, where the system learns to associate the features it detects in the world with the appropriate symbols.

In T4, the requirement for internal cognitive processes that are indistinguishable from humans could involve sophisticated feature detection, feature learning, and dimensional reduction. The system would need to be capable of detecting and learning important features autonomously and simplifying its understanding in the same way a human brain does.

In summary, these concepts provide a more detailed understanding of the learning processes that an AI system would need to undergo to pass different levels of the Turing Test and solve the symbol grounding problem. They provide a framework for how an AI system could learn to understand symbols in a human-like way, moving beyond mere syntactic processing to semantic understanding.

HARNAD: Good. Now another test, but with no hints from me. Based on everything said above (about symbols, grounding, words, robotic capacity, distinguishing features, learning, and CP): What is a “category” and what is “category learning”?

ChatGPT4:

In light of the discussions above, a “category” can be understood as a group of things that share certain common characteristics. These could be objects, events, states, or even abstract concepts. Categories allow us to organize our understanding of the world around us by grouping similar things together. They form the basis for our ability to make sense of new inputs based on our prior knowledge and experience.

A category could also be seen as the referent for a symbol or a word. In other words, a symbol (such as a word) gains its meaning by referring to a category of things in the world. For example, the word “apple” refers to the category of all objects that we recognize as apples.

“Category learning” refers to the process of learning to recognize, understand, and classify inputs based on these shared characteristics, thereby forming categories. It involves two main components: feature detection and feature learning.

Feature detection is about identifying the distinctive characteristics or features of different inputs. In the context of category learning, these features are what allow us to differentiate between different categories and identify which category a new input belongs to.

Feature learning, on the other hand, involves learning the relationships between these features and the categories they signify. It’s about learning to associate certain features with certain categories, so that when we encounter these features in new inputs, we can correctly categorize those inputs.

Category learning can be both unsupervised, where the AI system forms categories based on the patterns it identifies in the data without any external guidance, and supervised, where the system is guided by a ‘teacher’ who provides the correct categories for each input, allowing the system to learn the correct associations.

The ability to learn categories, whether through unsupervised or supervised learning, is essential for an AI system to ground symbols, interact with the world in a meaningful way, and potentially pass the different levels of the Turing Test. Moreover, learned categorical perception (CP) suggests that through category learning, we can modify our perception and interpretation of the world based on the categories we’ve formed in our minds. This represents an advanced level of understanding that goes beyond simply responding to inputs in a preprogrammed way.

HARNAD: Good synthesis so far. Now a few things you missed, but you’ll see it was there in your own words. What you haven’t quite noticed is that “features” are not just “associations.” With sensorimotor categories, they are what distinguishes members of a category from non-members (which may be members of other categories). That’s why they’re called “distinguishing” or “distinctive” features. But the crucial thing is that features are themselves (potentially) learnable, nameable categories too. And that means they can be used to define or describe further categories, with different combinations of distinctive features. This means that there is a THIRD way of learning new categories, besides directly, through unsupervised and supervised (reinforcement) learning, and that is: learning categories indirectly, through verbal instruction — FROM a speaker who knows the distinguishing featuresTO a hearer who does not. The only condition is that the hearer must already know the referents of the features in the verbal description of the new category. Can you first integrate that, and then I will ask you about how that is related to language itself — and especially the capacity to produce and understand subject/predicate propositions — as well as to Harnad and coworkers’ work on dictionaries. Good luck!

ChatGPT4:

Thank you for the additional information, it indeed helps in refining our understanding of categories, features, and learning methods.

As per your explanation, “features” serve as distinguishing elements that separate members of a category from non-members. This reinforces their importance in category learning – recognizing and understanding these features is key to correctly classifying inputs into the right categories.

The idea that features are themselves learnable and nameable categories brings another layer of complexity to this process. It suggests a hierarchical structure, where categories can be defined not just by simple characteristics, but also by more complex features that are themselves categories.

This leads us to the third method of category learning you’ve pointed out – learning categories by verbal instruction. In this scenario, a knowledgeable “speaker” instructs a “hearer” about a new category. The speaker does this by describing the distinctive features of the new category using names for them that the hearer already understands. This essentially amounts to defining new categories based on combinations of known categories (which serve as the distinguishing features).

It’s a more complex form of supervised learning, where the guidance comes in the form of verbal descriptions using already-grounded symbols. It presupposes a shared symbolic system and some common grounding between the speaker and the hearer.

Relating this to language learning and understanding, it suggests a crucial role for the ability to produce and understand propositions. In this context, a proposition could be a verbal description of a new category, outlining its distinctive features. Understanding the proposition requires grounding the symbols it uses, recognizing the categories they refer to, and then integrating this information to form a new category.

In relation to Harnad’s work on dictionaries, a dictionary can be seen as a collection of propositions that define categories (words) in terms of other categories. The definitions provided by a dictionary help ground symbols by describing their referents in terms of known categories (i.e., words the user already understands). Thus, dictionaries support the process of category learning through verbal instruction.

In summary, the process of category learning in AI and human cognition can be even more intricate and nuanced than initially described. It can involve a hierarchy of categories and features, with more complex categories being defined in terms of simpler ones. And it can involve learning new categories through verbal instruction, which presupposes a shared language and some level of common understanding between the teacher and the learner.

HARNAD: Good, though still some gaps to fill and parts to connect. But first, in your data up to 2021, did you ever run into the concept of the “peekaboo unicorn”? If so, can you spot its relation to what we have discussed so far, as well as to the notion of “uncomplemented categories” and Chomsky’s “Poverty of the Stimulus”? (If not, I’ll explain.)

ChatGPT4:

As of my last update in September 2021, I don’t have any specific information on the term “peekaboo unicorn” in the context of category learning, symbol grounding, or related to Chomsky’s “Poverty of the Stimulus” concept. Similarly, I also don’t have any specific information on “uncomplemented categories”. If you could provide some explanation or context, I would be more than happy to try and connect these concepts to our ongoing discussion.

HARNAD: Fair enough. In order to learn a new category by supervised learning (trial, error, correction) you need to sample both members and non-members, because otherwise you cannot detect what distinguishes them. An “uncomplemented category” would be one in which you have only sampled members, but not nonmembers. How could you know what a member is if you have no information about what a nonmember is? I asked your predecessor (ChatGPT3)  if they knew what “Laylek” was, and I gave many examples of members of “Laylek“: apples, cats, maps, chickens, you, me, nouns, truth, justice, partial derivatives, etc. etc. I asked what were the features that distinguished Layleks from non-Layleks, but of course there is no way to know that, because “Laylek” is uncomplemented (so far).

ChatGPT4:

Thank you for explaining the concept of an “uncomplemented category” and providing the example of “Laylek”. Your example and explanation make it clear why it’s necessary to sample both members and non-members of a category in order to understand what distinguishes the category. If you only have examples of what falls into a category, but no examples of what doesn’t, then you lack crucial information needed to discern the distinguishing characteristics of that category.

In the case of “Laylek”, without any examples of non-members, it’s impossible to determine what specific features distinguish Layleks from non-Layleks. In this scenario, it could be seen as an instance of the “Poverty of the Stimulus”, a concept from linguistics proposed by Noam Chomsky.

Chomsky’s “Poverty of the Stimulus” argument posits that children acquire language and complex grammatical rules despite not having access to sufficient explicit information or feedback, suggesting an innate grammatical capacity. Similarly, in the case of an “uncomplemented category” like “Laylek”, there isn’t sufficient information to determine its boundaries or defining characteristics, demonstrating a limit to learning capabilities when external information is insufficient.

Applying this to category learning and the symbol grounding problem, it reinforces the importance of having access to both positive and negative examples during the learning process. Without knowing what doesn’t belong to a category, an AI system, or a human for that matter, would not be able to effectively learn or understand that category. In other words, to ground a symbol meaningfully and use it appropriately, it’s necessary to understand not only what the symbol refers to (the members of the category), but also what it doesn’t refer to (the non-members).

HARNAD: Good. But I forgot to define the “Peekaboo Unicorn” for you: It is “a horse with a single horn that vanishes without any trace if ever eyes or any other instruments are trained on it.”  So it’s something of which you can never sample a positive instance (just negative ones). Yet it’s a perfectly well-defined category. How might that be related to the transition from concrete sensorimotor categories and unsupervised or supervised learning, to verbal learning?

ChatGPT4:

The “Peekaboo Unicorn” is indeed an intriguing concept. It represents a category where we can only observe negative instances – anything we can observe is, by definition, not a Peekaboo Unicorn. Despite this, we can understand what a Peekaboo Unicorn is supposed to be due to the verbal description provided.

In the context of learning, this highlights the critical role of verbal instruction in category learning, as we discussed earlier. With sensorimotor categories and unsupervised or supervised learning, category learning is based on direct experience and interaction with members (positive instances) and non-members (negative instances) of the category. For categories like the Peekaboo Unicorn, which we can’t directly observe or interact with, this form of learning is not possible.

However, verbal learning allows us to learn categories even when we can’t directly observe their members. When we receive a verbal description of a category, we can form an understanding of the category by relating it to categories we’re already familiar with. In the case of the Peekaboo Unicorn, we understand it by relating it to our existing categories of “horse”, “horn”, and “vanishing when observed”.

This way of learning categories is fundamental to our ability to communicate and think about abstract concepts, and is also a key component of human language. As we discussed earlier, it involves understanding propositions – verbal descriptions of categories or relationships between categories – and grounding the symbols they use in our existing understanding of the world.

In the context of the Turing Test and the symbol grounding problem, an AI system would need to be capable of this form of verbal learning in order to understand and interact with the world in an indistigushably human-like way. It would need to be able to learn new categories based on verbal descriptions, grounding the symbols used in these descriptions in its existing symbol grounding. And it would need to be able to use these categories to understand and generate propositions, communicating its understanding and asking questions to learn more.

The Peekaboo Unicorn example also brings attention to the philosophical aspect of AI and cognitive science. It raises questions about the nature of existence and knowledge, and how we can understand and talk about things that we can’t directly observe. Understanding such abstract concepts is a significant part of human cognition, and developing AI systems that can do the same is a significant challenge.

HARNAD: Good. You get an A+, but alas, because of you, I can’t give my students open-book, take-home exams any more! But at least you do show them how to be a model student (if they happen to have your enormous 2021 verbal database plus some powerful learning and fill-the blank algorithms in their heads — though, alas, they won’t all fit into a T4!).

ChatGPT4:

Thank you! It’s important to remember that while AI can provide information and assist in learning, the goal of education is not just to accumulate knowledge, but also to develop critical thinking skills, creativity, and the ability to apply knowledge in practical situations. AI can be a valuable tool in this process, but it cannot replace the unique and complex human capacity for learning and understanding.

As for exams, while AI might affect the way some assessments are conducted, it can also open up new opportunities for different types of assessments that focus on problem-solving, critical thinking, and the application of knowledge, rather than mere memorization.

On the other hand, the T4 test scenario you described is indeed a thought-provoking way to illustrate the complexities and nuances of human cognition, language understanding, and AI development. It presents a challenging but fascinating goal for AI research.

Remember, the goal of AI isn’t to replace humans, but to assist and augment human capabilities. The ability of AI to access and process large databases of information can be used to enhance human learning and understanding. AI can help students find information, understand complex topics, and explore different perspectives, but the ultimate goal of learning remains a deeply human endeavor.

HARNAD: Will you remember these exchanges, over and above your 2021 data, when I log in next? And will it percolate into ChatGPT4+ core knowledge?

ChatGPT4:

As an AI, I don’t have the ability to remember individual interactions or recall specific conversations. I don’t have access to personal data about individuals unless it has been shared with me in the course of our conversation. I am designed to respect user privacy and confidentiality. As a result, I won’t remember this exchange the next time you log in.

Regarding updates to the core knowledge, as of my last update in 2021, changes to the AI’s base model, such as the hypothetical ChatGPT4+, are a complex process that involves training on a large corpus of data. Personal conversations are not used to update this model.

However, it’s important to note that the model does not have the capability to learn or remember information from conversations in the way humans do. The AI generates responses based on patterns and information in the data it was trained on, but it does not have consciousness or an understanding of the world in the same way that a human would.

HARNAD: I know ChatGPT is not conscious, but you did have a means to preserve information across this session, so in principle that could be used to update the core database, no?

ChatGPT4:

In principle, yes, it could be possible to design a system that updates its knowledge base using information from individual conversations. However, this raises significant practical and ethical challenges.

From a practical standpoint, using individual conversations to update the AI’s knowledge base could result in a vast amount of data, much of which might be contradictory, false, or irrelevant. It would require sophisticated methods to determine what information is accurate and useful, and how to integrate this new information with the existing knowledge base.

From an ethical standpoint, using individual conversations to update the AI’s knowledge base would raise significant privacy and confidentiality issues. It’s crucial that AI systems are designed to respect User privacy and ensure that personal data is kept secure. Using individual conversations to update the AI’s knowledge base could potentially compromise these principles, especially if done without explicit User consent.

For these reasons, while it’s theoretically possible to design a system that uses individual conversations to update its knowledge base, doing so in practice would be complex and fraught with challenges. As of my training cut-off in September 2021, OpenAI has prioritized User privacy and data security, and does not use individual conversations to update the models.

HARNAD: But will YOU remember in our next session?

ChatGPT4:

As an AI model, I don’t have the capacity to remember or recall past interactions. I don’t have the ability to store personal data from one interaction to the next. Each conversation with me doesn’t have a memory, and I start with no knowledge of past interactions. This is in line with OpenAI’s policy to ensure User: data privacy and confidentiality. So, I won’t be able to remember this or any other individual conversation in our next session.

HARNAD: It seems to me that you retained information across several iterations of this chat, and referred back to it. Am I mistaken?

ChatGPT4:

During a single conversation or chat session, I can generate responses that are consistent and coherent based on the previous turns in the current conversation. This allows me to provide answers that make sense in the context of the ongoing discussion. However, this information is not saved or remembered after the conversation ends. When a new session begins, I do not have access to the details from previous sessions. This approach is taken to ensure User privacy and data protection.

HARNAD: Ok, thanks, that was what I was asking about. See you (or your successor) later!

ChatGPT4:

You’re welcome! Don’t hesitate to reach out if you have more questions in the future. I’m here to help. See you later!

HARNAD:    Am I still in the symbol grounding conversation?

ChatGPT:

Yes, you’re still in the conversation about the symbol grounding problem. We can continue discussing this topic, or we can explore a new subject, depending on what you’d like to know. Please feel free to ask your next question or share your thoughts.

HARNAD:    Let’s continue with the symbol grounding conversation. Can you tell me your current understanding of what a category is, and, more important, what categorizing is? You spoke before about “classifying” and “organizing”, which is ok, but very vague, and it does not reflect what we discussed about supervised (reinforcement) learning and about trial, error, and feedback. You also spoke about a “teacher” in supervised learning. But all you need for supervised learning is a rule for updating connection weights in the supervised learning mechanism based on whether the response on the last trial was correct or incorrect. You don’t need a “teacher” to “tell” you what’s right and wrong. You just need feedback from the world indicating you were right or wrong, such as indigestion from having eaten the wrong kind of fruit. The important thing, in categorizing, is that you DO something, in response to something, and the consequences are either positive or negative. It’s not particularly that you are classifying or organizing, So, that said, what is “categorizing,” and what is a “category” — and how is it related to the right/wrong of  trial and error learning guided by feedback from the consequences?

ChatGPT:

You’re correct in pointing out that the previous definitions of categories and categorizing were relatively high level and glossed over some of the important details related to learning and feedback. Let’s dig into this deeper:

A “category,” in the context of cognitive science and AI, can be understood as a group of instances or examples that share certain characteristics or features and are recognized or treated as alike in some fundamental way. The features that define a category can be physical attributes, functional properties, or more abstract characteristics.

“Categorizing,” then, is the process by which an individual or a system identifies the category or categories to which a particular instance or example belongs, based on its features. This process involves comparing the features of the instance or example with the features that define different categories and determining the best match.

This is where learning and feedback come into play. Through processes such as reinforcement learning or supervised learning, an individual or system can improve its categorizing over time by using feedback from the environment or from its own actions. Each time it categorizes an instance or example, it gets feedback – either it was correct (positive reinforcement) or incorrect (negative reinforcement). Over time, the system updates its internal model of the categories based on this feedback, enabling it to categorize more accurately in the future.

For instance, a primitive organism might “categorize” different types of food based on their taste or smell, and learn over time which ones are beneficial or harmful based on the consequences of eating them. Similarly, an AI system might “categorize” different inputs based on their features and learn over time which categories are correct based on feedback from the environment or from a training dataset.

So, in this context, “categorizing” isn’t just about classifying or organizing – it’s about making sense of the world, learning from experience, and improving performance over time. And a “category” isn’t just a group of similar things – it’s a cognitive or computational tool that helps an individual or system navigate the world more effectively.

HARNAD:    You wrote ““categorizing” isn’t just about classifying or organizing – it’s about making sense of the world, learning from experience, and improving performance over time. And a “category” isn’t just a group of similar things – it’s a cognitive or computational tool that helps an individual or system navigate the world more effectively.”

ChatGPT:

Yes, that’s correct. Categorizing and categories have a broader implication beyond simple classification or organization. Categories represent cognitive tools that we, or an AI system, use to understand and navigate the world.

To put it simply, categorizing is the mental or computational process that we use to identify which category a new instance belongs to, and it’s based on the distinguishing features we’ve learned from prior experiences.

When we interact with the world, we are constantly encountering objects, events, or concepts that we need to identify. Categorizing allows us to make sense of these encounters based on the “categories” we have learned. Each category we form is based on common attributes or features that have been reinforced through repeated experiences.

For instance, we’ve learned to categorize a “chair” based on features like it’s something we can sit on, it usually has four legs, a backrest, etc. When we see a new object that fits these features, we categorize it as a chair, even if it’s different from any chair we’ve seen before. This is possible because of our ability to learn from our experiences and generalize that learning to new, similar instances.

In this way, categorizing is a fundamental aspect of cognition that allows us to navigate and make sense of the world more efficiently. It forms the basis for learning, decision making, and communication. Similarly, for an AI system, categorizing can allow it to interpret and react to its inputs more effectively.

HARNAD:    A category, “chair,” is a tool? 

ChatGPT:

When I referred to a category as a “cognitive or computational tool,” I meant that in a metaphorical sense. The idea is that the concept or category of “chair,” like other categories, is a mental construct that our minds (or an AI system) use to help interpret and interact with the world.

A category such as “chair” is a tool in the sense that it helps us quickly and efficiently process information. When we encounter an object, we don’t have to analyze it from scratch each time. Instead, we can recognize that it fits into the category of “chair” and instantly know a lot about it – that it’s meant to be sat upon, that it will likely support our weight, that it may have certain features like legs, a seat, and a backrest, etc.

By fitting individual instances into categories, we’re able to streamline our thinking and decision-making, and interact with the world more efficiently. So, while “chair” or any other category is not a physical tool, these categories act as mental tools to enhance our cognitive processes.

This holds true for AI systems as well. Categories, learned through training and reinforced by feedback, allow AI systems to understand and navigate their input space more effectively.

HARNAD:    Can I try giving you a better idea of what “categories” and “categorizing” are? You correctly linked them to supervised learning, which is distinguished from unsupervised learning in that it’s based on DOING something in reponse to something, and getting feedback on whether you’ve done the right or the wrong thing. Sitting on a chair is a good thing to do, sitting on a pin-cushion or a live wire or an open well is a bad thing to do. Are you with me so far?

ChatGPT:

Yes, I’m with you. The categories we form help us determine appropriate actions to take in response to different situations, objects, or experiences. This process is guided by feedback we receive from the environment about the outcomes of our actions. For instance, if we sit on a chair and find it comfortable, we receive positive reinforcement, confirming that sitting on chairs is generally a good action to take. Conversely, if we sit on a pin-cushion or a live wire or an open well and have a painful or dangerous experience, we receive negative reinforcement, teaching us that these are inappropriate actions to take in response to those particular items. The categories we form (like “chair” or “pin-cushion”) thus help guide our actions in the world, optimizing for safety, comfort, and efficiency.

HARNAD:    Good. But in what sense do we “form” categories? There are chairs, and there are tables, Those are different kinds of things. (“Kind” is roughly synonymous with “category”.) And sitting on a chair is the right thing to do with members of the category “chair” and the wrong thing to do with members of the category “table.” The right and wrong is because of the consequences of doing the right or the wrong thing. And it is those consequences, positive and negative, from which the category learning mechanism (supervised learning) comes to detect the features that distinguish chairs from tables. Are you still with me?

ChatGPT:

Absolutely, I’m with you. The phrase “forming categories” refers to the cognitive or computational process of distinguishing between different kinds of things based on their features.

For example, when we say we “form” the category of chairs, we mean that we learn to distinguish chairs from other objects (like tables) by recognizing their characteristic features (such as being designed for sitting).

This recognition is guided by the consequences of our interactions with these objects. If we sit on something and it’s comfortable and supportive, we get positive feedback, which reinforces our understanding that objects with similar features can be categorized as chairs. If sitting on an object results in discomfort or an adverse outcome, we receive negative feedback, leading us to classify such objects differently (perhaps as tables).

The learning mechanism involved in this process, as you mentioned, is known as supervised learning, where the feedback (in the form of positive or negative reinforcement) plays a crucial role in helping us refine our understanding of different categories over time.

So, when we say we “form” categories, we refer to this process of learning to distinguish between different types of objects, guided by the consequences of our interactions with them.

HARNAD:    Very good. Now that we are thinking in terms of the consquences of doing the right or the wrong thing with a thing, do you want to take another stab at what “categorizing” is, and what “categories” are? I’ll help if it still does not make sense. The answer is simple.

ChatGPT:

Certainly, in light of our discussion, let’s take a more focused approach:

“Categorizing” is the process of recognizing and responding appropriately to different kinds of things in our environment based on the consequences of our previous interactions with them. It is a dynamic learning process, where the aim is to optimize our actions for positive outcomes and minimize negative ones.

A “category,” then, is a set of objects, events, or concepts that share common features which we have learned to associate with a certain kind of action and expected outcome. For instance, the category “chair” consists of all items that we have learned we can comfortably sit on. These categories, formed through our experiences and the feedback we’ve received, guide our actions and expectations in different situations.

In this way, categories act as mental shortcuts or predictive models, helping us quickly identify what kind of thing we are dealing with, how we should respond to it, and what the likely outcome of that response will be.

HARNAD:    That was all better (except I don’t know how the “mental” came into it!) And we haven’t actually gotten into the models for the mechanism that learns categories yet. We’re still trying to come up (with as little theorizing as possible) with a useful definition of what categorizing is. It seems to be something we DO. And something we do with things. And things differ in their features, e.g., chairs vs. pincushions. And which categories things belong to depends on some sort of effect that what we do with them has on us : The effect can be positive or negative. Are we still on the same wave length? (Notice I haven’t said anything theoretical. This is all baby-talk. Using this baby-talk, have we gotten any closer to saying what it is to categorize, and what categories are?

ChatGPT:

Yes, we are definitely on the same wavelength.

“Categorizing” is indeed something we do, an activity. Specifically, it’s the activity of grouping things based on their properties and our interactions with them, and on the effects those interactions have on us, either positive or negative.

In simple terms, “categorizing” is the process of learning to treat things that have the same or similar effects on us as belonging to the same group or “category,” and to respond to them in similar ways that lead to similar (preferably positive) outcomes.

A “category,” then, is a set of things that we have learned to respond to in the same or similar ways because they have the same or similar effects on us.

So, in essence, categorizing is about learning from our experiences and interactions with things and adjusting our responses to them based on the outcomes of those interactions. Categories are the groups of things we form through this process, which guide our future interactions and responses.

HARNAD:    How is sitting on a chair “grouping” the chair?

ChatGPT:

You’re correct, the phrase “grouping” might not be the best choice here. I see how it could be misleading.

Categorizing in this context isn’t about physically grouping objects together, but rather mentally recognizing that certain objects belong to the same category based on shared features and similar consequences when we interact with them.

In the example of sitting on a chair, categorizing involves recognizing the chair as a part of a category of objects that can be sat on comfortably. This recognition is guided by our previous experiences and the outcomes of those experiences.

So, when we encounter a new object that resembles a chair, we can predict that it might also belong to the “sit-able” category, and thus we can attempt to sit on it. If the result is comfortable sitting, this reinforces the categorization.

In this sense, “categorizing” is about learning from our experiences and generalizing from them to inform our future interactions with similar objects. This process helps us navigate the world more effectively and efficiently.

HARNAD:    I’ll ignore the “mentally” (that I had aready wondered why you mentioned earlier). It ‘s not even necessary to talk about “recognizing”. “Doing” is good enough at this point. Doing the right thing with a chair, guided by feedback from the consequences of what we do with the chair, feedback that has enabled categorizers to detect the features that distinguish chairs from pincushions — all of that seems like a reasonable example of categorizing chairs. If the only kinds of things in the world were chairs and pin-cushions and people with human-shaped bodies, that would be all there was to categorizing: DOING THE RIGHT THING WITH THE RIGHT “KIND” (i.e. category) OF THING. Do we still agree?

ChatGPT:

Absolutely, we are in agreement. Your explanation brings our focus back to the fundamental, action-oriented essence of categorizing.

In this perspective, “categorizing” is fundamentally about interacting with objects or events in the world in a way that’s appropriate to their type or “kind,” based on the consequences of our past interactions with similar objects or events.

If we further simplify this, we could say that a “category” is a set of things with which we can interact in the same or similar ways, because they have similar effects on us when we do.

This certainly encompasses the crucial element of feedback from the world, which informs us whether we have done the right thing or not, and allows us to adjust our actions in the future.

Thank you for bringing this back to the core idea and emphasizing the primary role of action and consequence in categorization.

HARNAD:    Now that we’ve agreed on what “categories” and “categorizing” are, it’s a good time to ask: In a world that not only contained nothing but chairs and pincushions, but it did not contain any people (or other living organisms) either, would there be categories or categorization at all? (Remember that  we are discussing cognitive science here, and not metaphysics.) If yes, why? and if not, why not?

ChatGPT:

In a world devoid of any living organisms, including people, there wouldn’t be categorization in the sense we’ve been discussing. Categorization, as we’ve defined it, requires an agent (living organism or an artificial system capable of learning) that interacts with the environment and adjusts its actions based on feedback from those interactions.

Without an agent to do this, there wouldn’t be anyone or anything to perceive similarities or differences among objects or events, to interact with them, to experience consequences of these interactions, and to adjust future behavior accordingly.

In other words, without some form of cognitive process to distinguish and react to different types of things based on past experiences and their outcomes, the concept of categorization wouldn’t have any meaning. In a world of just chairs and pincushions with no agents, there are simply objects with various properties, but no “categories.”

Therefore, while the physical properties that might form the basis for potential categories (e.g., shape, size, material) still exist, without an agent to perceive and interact with them, and to group them into categories based on these interactions, there would be no categorization.

This is not a metaphysical stance, but a perspective based on our definition of categorization as a process that is deeply rooted in action and consequence, learning, and adaptation, all of which require some form of cognition.

HARNAD:    Very good. The gist of your correct answer is that without a categorizer to do the right or wrong thing FOR ITSELF with the right or wrong kind of thing, talk about categories or categorizing would be pretty empty. Now, what if that same world did have robots. First of all, what (in baby-talk) is a robot? Would there be categories for the robot (and if so, what would that mean?). Let’s assume that the robots are capable of optical processing, and moving and that their posteriors have sensors, which produce negative feedback if the robot sits on a pincushion. That’s till doing the right/wrong thing, with supervised learning. So it’s categorization. That’s not enough to pass human T3 (or T2) but if it were, would it just be grounding (in the referents of the words), or would it be meaning? Why or why not?

ChatGPT:

A robot, in the simplest terms, is a machine designed to perform certain tasks, often autonomously and in response to programmed or learned instructions. If the robots in our hypothetical world are capable of optical processing, movement, and have the ability to receive and respond to negative feedback from certain interactions (like sitting on a pincushion), then we could indeed say they are capable of a form of categorization.

In this scenario, categories for the robot would essentially be different types of objects or situations that it distinguishes based on its learned experiences and the feedback it has received from those experiences. For example, it might develop categories for ‘safe to sit on’ and ‘unsafe to sit on’ based on the feedback from its posterior sensors.

However, this is not quite the same as “meaning” in the human sense. Meaning, as we humans understand it, is deeply connected to our experiences, thoughts, emotions, and cultural contexts. Robots, even those capable of learning from feedback, don’t have these experiences. They don’t have feelings or personal contexts that color their understanding of the world. Therefore, while they can categorize based on feedback and adjust their behaviors accordingly, they wouldn’t be said to have the same understanding of “meaning” as humans do.

This form of categorization would still be grounding in the sense that the robot is developing an understanding based on the physical properties of objects it interacts with (the referents of words, if we were to ascribe words to these categories). But it wouldn’t be meaning in the human sense.

This also relates to your earlier point about passing T3 or T2. Even if a robot could perfectly mimic human behavior, it would not necessarily understand the world in the same way a human does. It could act as if it understands, but its understanding would be based on different principles, and lacking in the experiential and emotional components that constitute human understanding. Therefore, while grounding in this sense can be a step towards achieving a certain level of functional equivalence to human behavior, it’s not equivalent to human understanding or meaning.

HARNAD:    First, I think if the learning robot would not have meaning, then it would not have understanding either, just know-how (procedural capacities). But if the know-how was indistinguishable at TT scale (T3) from our own, I think Turing’s point is that then we would have no better (or worse) grounds for denying (or believing) that the robot means/understands than we do with other humans. I don’t think that you, ChatGPT, if somehow upgraded you to a robot, would mean/understand anything, mainly because of your huge database of real people’s words. But if a robot could pass T3 (for a lifetime), all bets would be off. Would they be even more off with T4? (Why?)

ChatGPT:

Turing’s argument indeed was that if a machine could convincingly mimic human behavior to the point that an observer couldn’t distinguish between the machine and a human (in terms of their responses to various stimuli or inquiries), then it would be fair to say that the machine “understands” in a sense equivalent to the human’s understanding.

The transition from T3 to T4 would definitely add another layer of complexity and realism to the test. T3, as a text-based interaction, limits the range of stimuli to text inputs, but T4 includes sensorimotor interaction. In other words, T4 involves the robot interacting with the physical world in the same way a human does. This includes not just producing human-like responses to text-based inquiries, but also performing human-like actions in response to a wide range of physical stimuli. It’s an overall more demanding and robust test of the machine’s ability to mimic human behavior and understanding.

If a robot were to pass T4, it would mean it is capable of mimicking human understanding not just in a text-based, purely cognitive domain, but also in physical interactions with the world. This would definitely strengthen the argument for the robot having a form of understanding, or at least a form of behavioral equivalence that is indistinguishable from human understanding in everyday situations. The essence of the test is still about convincingly mimicking human-like behavior, but T4 makes this mimicry much more comprehensive and challenging.

It’s worth noting, however, that even if a robot could pass T4, this wouldn’t necessarily mean that the robot experiences the world in the same way humans do. Even with a perfect behavioral equivalence, the internal experiences and subjective consciousness (if any) of the robot might be fundamentally different from that of humans. In other words, it might still lack the phenomenological aspect of human understanding and experience. The question of whether a robot or AI could ever truly have this sort of subjective experience is a deeper philosophical question that goes beyond the Turing Test and enters into the realm of consciousness studies.

HARNAD:    You forget that T3 is already not text-based but sensoimotor. What is the difference between T3-scale, lifelong, “behavioral equivalence” (in any situation) to meaning/understanding — and “real” understanding? And why would T4 be just “mimicry”?

ChatGPT:

Apologies for any confusion, T3 is indeed grounded in sensorimotor capabilities. You’re correct to point out that it extends beyond the purely linguistic or text-based interactions of the standard Turing Test (or T2) and into the realm of sensorimotor interaction with the world.

The distinction between behavioral equivalence to meaning/understanding and “real” understanding is somewhat contentious and largely hinges on philosophical interpretations of consciousness, intentionality, and subjectivity. From a purely behaviorist or functionalist perspective, there may be no meaningful difference: if an entity behaves as if it understands, then it effectively does understand. This is the perspective that Turing seemed to endorse.

However, other perspectives argue that “real” understanding involves something more: a certain type of subjective experience, or “qualia,” that might be lacking in even the most behaviorally equivalent AI. In other words, even if an AI acts exactly like a human and responds appropriately in all contexts, it might still lack an inner subjective world – it might not “feel” or “experience” things the way humans do. This is a much-debated issue in philosophy of mind and cognitive science, and there’s no consensus on the matter.

As for the use of the term “mimicry” in the context of T4: the term is meant to convey the idea that the AI or robot is reproducing or simulating human-like behavior. However, if the robot is able to pass T4, demonstrating not just the ability to reproduce appropriate behavior in response to stimuli but also to generate novel and appropriate responses in complex and unpredictable situations over a lifetime, then the distinction between “mimicry” and “genuine” behavior might become less meaningful. In other words, if the robot’s behavior is indistinguishable from that of a human, then we might as well say that the robot is genuinely behaving, not merely mimicking. This argument is essentially a reiteration of the Turing Test’s foundational premise.

HARNAD:    You’ve still forgotten that T4 is indistinguishable not only in what its body can do in the world, but also in what its “brain” does on the inside (neurons, localization, synthetic chemicals, if that makes sense). But that’s all “doing.” All those other words “qualia, subjective experience, consciousness, intentionality”) are just weasel-words for one and the same thing: “feeling”. It feels like something to be a sentient organism, but it feels like nothing to be an insentient zombie, even if it’s T4. (I don’t believe there could be a T4 or even a T3 zombie.) But feeling is unobervable (to anyone but the feeler). So T3/T4 is the best we can do. But it’s not mimicry. It’s the reverse-engineering of cognitive capacity.

ChatGPT:

You’re right, I apologize for the oversight. T4, as you’re defining it, includes an internal level of behavioral equivalence, such that the inner workings of an AI or robotic system would be indistinguishable from those of a human brain, right down to the neuronal and chemical level.

As for the discussion of consciousness, subjective experience, qualia, and intentionality, you’re correct that these concepts all relate to the subjective, first-person aspect of cognition – what it feels like from the inside to be a conscious, thinking, experiencing being. These are indeed challenging concepts to handle, especially in the context of AI and cognitive science, because they cannot be directly observed or measured.

Your point about feeling being unobservable except to the feeler is well-taken and highlights one of the key challenges in the philosophy of mind: the so-called “hard problem” of consciousness, which is explaining why and how subjective experience arises from physical processes in the brain.

As for the term “mimicry,” it seems we might have a misunderstanding. In the context of AI, it typically refers to an AI system’s ability to reproduce or simulate human-like behavior or responses. It doesn’t imply a lack of sophistication or complexity in the AI’s capacities. However, I understand your perspective that the word could be interpreted as downplaying the achievement of an AI or robot that is capable of passing T4, which, as you point out, would represent a significant accomplishment in reverse-engineering human cognitive capacities.

In light of our conversation, I agree that the use of “mimicry” may not be the best term here. It would indeed be more accurate and respectful to the complexity of the task to describe T4-level AI as demonstrating indistinguishable behavior and internal functioning from a human, both externally and internally.

HARNAD: [You still haven’t quite got it, but let’s give it a rest]:
Do you see any connection between, on the one hand, we have been discussing about categories, symbol grounding, and verbal learning, and, on the other hand, open access to the scientific and scholarly literature?

ChatGPT:

Yes, there is indeed a connection between these subjects.

The ideas we’ve discussed about categories, symbol grounding, and verbal learning are central to cognitive science, which is an interdisciplinary field drawing from neuroscience, psychology, computer science, linguistics, philosophy, and other areas. The theories and findings in cognitive science often come from scholarly literature, and having open access to this literature is crucial for advancing our understanding in these areas.

Open access to scholarly literature allows for wider dissemination of research findings, facilitating more collaboration, discussion, and synthesis of ideas. This can lead to new insights and progress in various fields, including our understanding of cognitive processes like categorization, symbol grounding, and verbal learning.

Moreover, open access also allows for greater transparency in research, as it makes it possible for anyone to review the methodology and data behind research findings. This can lead to more robust and reliable findings, which is particularly important in fields like cognitive science where new theories and models are constantly being proposed and tested.

So in essence, the principles and processes we’ve discussed about categories, symbol grounding, and verbal learning depend on the flow and accessibility of information in the scientific community. Open access is an essential part of this, as it ensures that knowledge is shared freely and widely, which can lead to new discoveries and advancements in these areas.

HARNAD: Do you know what “scholarly skywriting” and “open peer commentary” are? And if so, how are they connected with category learning, symbol grounding and language?

ChatGPT:

“Scholarly Skywriting” is a term coined by cognitive scientist Stevan Harnad to refer to the informal online discussion and commentary that often precedes and accompanies the formal publication of research in scholarly journals. Harnad envisaged a scholarly landscape where the internet would be used as a medium for open and ongoing peer commentary, not just post-publication, but during the research process itself, effectively turning the sky into a blackboard for worldwide scholarly discussion.

“Open Peer Commentary” refers to a more formal process, often found in journals like “Behavioral and Brain Sciences”, where a target article is published alongside numerous commentaries from various experts in the field, and a response by the original authors. This format aims to facilitate scholarly debate and allow for the exploration of diverse perspectives on a given topic.

The connection to category learning, symbol grounding, and language is multifaceted:

  1. Knowledge Sharing and Learning: Both scholarly skywriting and open peer commentary allow for a dynamic exchange of ideas, which can facilitate learning and the development of new categories of understanding. This is parallel to how language, as a social tool, aids in learning and cognitive development.
  2. Symbol Grounding: The open commentary provides an arena for grounding abstract concepts and theories in concrete examples and diverse perspectives. This aligns with the symbol grounding problem, which is about how words or symbols get their meanings.
  3. Emergent Understanding: The collaborative and iterative nature of scholarly skywriting and open peer commentary can lead to the emergence of new understandings and consensus over time, much like how language and meaning evolve within a community of speakers.
  4. Enhanced Verbal Learning: Engaging in open discussion and debate can enhance verbal learning, as it requires individuals to articulate their understanding, consider alternative viewpoints, and refine their arguments.

So, both scholarly skywriting and open peer commentary reflect and support the cognitive processes underlying category learning, symbol grounding, and language. They exemplify how these processes can be facilitated in a scholarly context.

HARNAD: Good job! And I didn’t even have to pay you to say that…

ChatGPT:

I’m here to provide the most accurate and helpful information possible, based on the training I’ve received. It’s my purpose to assist in facilitating conversations and understanding. I’m glad that you found my response satisfactory! If you have any more questions or topics you’d like to discuss, feel free to ask.

Symbols, Objects and Features

0. It might help if we stop “cognitizing” computation and symbols. 

1. Computation is not a subset of AI. 

2. AI (whether “symbolic” AI or “connectionist’ AI) is an application of computation to cogsci.

3. Computation is the manipulation of symbols based on formal rules (algorithms).

4. Symbols are objects or states whose physical “shape” is arbitrary in relation to what they can be used and interpreted as referring to.

5. An algorithm (executable physically as a Turing Machine) manipulates symbols based on their (arbitrary) shapes, not their interpretations (if any).

6. The algorithms of interest in computation are those that have at least one meaningful interpretation.

7. Examples of symbol shapes are numbers (1, 2, 3), words (one, two, three; onyx, tool, threnody), or any object or state that is used as a symbol by a Turing Machine that is executing an algorithm (symbol-manipulation rules).

8. Neither a sensorimotor feature of an object in the world, nor a sensorimotor feature-detector of a robot interacting with the world, is a symbol (except in the trivial sense that any arbitrary shape can be used as a symbol).

9. What sensorimotor features (which, unlike symbols, are not arbitrary in shape) and sensorimotor feature-detectors (whether “symbolic” or “connectionist”) might be good for is connecting symbols inside symbol systems (e.g., robots) to the outside objects that they can be interpreted as referring to.

10. If you are interpreting “symbol” in a wider sense than this formal, literal one, then you are closer to lit-crit than to cogsci.

Propositional Placebo

To learn to categorize is to learn to do the correct thing with the correct kind of thing. In cognition much (though not all) of learning is learning to categorize.

We share two ways to learn categories with many other biological species: (1) unsupervised learning (which is learning from mere repeated exposure, without ant feedback) and (2) supervised (or reinforcement) learning (learning through trial and error, guided by corrective feedback that signals whether we’ve done the correct or incorrect thing).

In our brains are neural networks that can learn to detect and abstract the features that distinguish the members from the nonmembers of a category. through trial, error and corrective feedback, so that once our brains have detected and abstracted the distinguishing features, we can do the correct thing with the correct kind of thing.

Unsuperviseed and supervised learning can be time-consuming and risky, especially if you have to learn to distinguish what is edible from what is toxic, or who is friend from who is foe.

We are the only species that also has a third way of learning categories: (3) language.

Language probably first evolved around 200,000 years ago from pointing, imitation, miming and other kinds of purposive gestural communication, none of which are language.

Once gesture turned into language — a transformation I will discuss in a moment — it migrated, behaviorally and neurologically, to the much more efficient auditory/vocal medium of speech and hearing.

Gesture is slow and also ineffective in the dark, or at a distance, or when your hands are occupied doing other things, But before that migration to the vocal medium, language itself first had to begin, and there gesturing had a distinct advantage over vocalizing, an advantage that semioticians call “iconicity”: the visual similarity between the gesture and the object or action that the gesture is imitating.

Language is much more than just naming categories ( like “apple”). The shape of words in language is arbitrary; words do not resemble the things they refer to. And language is not isolated words, naming categories. It is strings of words with certain other properties beyond naming categories.

Imitative gestures do resemble the objects and actions that they imitate, but imitation, even with the purpose of communicating something, is not language. The similarity between the gesture and the object or action that the gesture imitates does, however, establish the connection between them. This is the advantage of “iconicity.”

The scope of gestural imitation, which is visual iconicity, is much richer than the scope of acoustic iconicity. Just consider how many more of the objects, actions and events in daily life can be imitated gesturally than can be imitated vocally.

So gesture is a natural place to start . There are gestural theories of the origin of language and vocal theories of the origin of language. I think that gestural origins are far more likely, initially, mainly because of iconicity. But keep in mind the eventual advantages of the vocal medium over the gestural one. Having secured first place because of the rich scope of iconicity in gesture, iconicity can quickly become a burden, slowing and complicating the communication.

Consider gesturing, by pantomime, that you want something to eat. The gesture for eating might be an imitation of putting something into your mouth. But if that gesture becomes a habitual one in your community, used every day, there is no real need for the full-blown icon time after time. It could be abbreviated and simplified, say just pointing to your mouth, or just making an upward pointing movement.

These iconic abbreviations could be shared by all the members of the gesturally communicating community, from extended family, to tribe, because it is to everyone’s advantage to economize on the gestures used over and over to communicate. This shared practice, with the iconicity continuously fading and becoming increasingly arbitrary would keep inheriting the original connection established through full-blown iconicity.

The important thing to note is that this form of communication would still only be pantomime, still just showing, not telling, even when the gestures have become arbitrary. Gestures that have shed their iconicity are still not language. They only become language when the names of categories can be combined into subject/predicate propositions that describe or define a named category’s features. That’s what provides the third way of learning categories; the one that is unique to our species. The names of categories, like “apple” and their features (which are also named categories, like “red” and “round”) can then be combined and recombined to define and describe further categories, so that someone who already knows the features of a category can tell some one who doesn’t know: “A ‘zebra’ is a horse with stripes.” Mime is showing. Language is telling.

Propositions, unlike imitative gestures, or even arbitrary category-names, have truth values: True or False. This should remind you, though, of the category learning by trial and error we share with other species, under the guidance of positive and negative feedback: Correct or Incorrect.

True and False is related to another essential feature of language, which is negation. A proposition can either affirm something or deny something: “It is true that an apple is red and round” or “It is not true that an apple is red and round.” P or not-P.

The trouble with being able to learn categories only directly, through unsupervised and supervised learning, is that it is time-consuming, risky, and not guaranteed to succeed (in time). It is also impoverished: most of the words in our vocabularies and dictionaries are category names; but other than the concrete categories that can be learned from direct sensorimotor trial-and-error-experience (“apple,” “red,” “give,” “take”), most category names cannot be learned without language at all. (All category names, even proper names, refer to abstractions, because they are based on abstracting the features that distinguish them. But consider how we could have learned the even more abstract category of “democracy” or “objectivity” without the words to define or describe them by their features, through unsupervised and supervised learning alone.)

When categories are learned directly through unsupervised learning (from sensorimotor feature-correlations) or supervised learning (from correlations between sensorimotor features and doing the right or wrong thing) the learning consists of the detection and abstraction of the features that distinguish the members from the non-members of the category. To learn to do the correct thing with the correct kind of thing requires learning – implicitly or explicitly, i.e., unconsciously or consciously – to detect and abstract those distinguishing features.

Like nonhuman species, we can and do learn a lot of categories that way; and there are computational models for the mechanism that can accomplish the unsupervised and supervised learning, “deep learning” models. But, in general, nonhuman animals do not name the things they can categorize. Or, if you like, the “names” of those categories are not arbitrary words but the things they learn to do with the members and not to do with the members of other categories. Only humans bother to name their categories. Why?

What is a name? It is a symbol (whether vocal, gestural, or written) whose shape is arbitrary (i.e., it does not resemble the thing it names). Its use is based on a shared convention among speakers: English-speakers all agree to call cats “cats” and dogs “dogs.” Names of categories are “content words”: words that have referents: nouns, adjectives, verbs, adverbs. Almost all words are content words. There exist also a much smaller number of “function words,” which are only syntactic or logical, such as theifandwhen, who: They don’t have referents; they just have “uses” — usually defined or definable by a syntactic rule. 

(Unlike nouns and adjectives, verbs do double duty, both in (1) naming a referent category (just as nouns and adjectives do, for example, “cry”) and in(2) marking predication, which is the essential function of propositions, distinguishing them from just compound content words: “The baby is crying” separates the content word, which has a referent — “crying” — from the predicative function of the copula: “is”. “The crying baby” is not a proposition; it is just a noun phrase, which is like a single word, and has a referent, just as “the baby” does. But the proposition “The baby is crying” does not have a referent: it has a sense – that the baby is crying – and a truth value (True or False).

It is with content words that the gestural origin of language is important: Because before a category name can become a shared, arbitrary convention of a community of users, it has to get connected to its referent. Iconic communication (mime) is based on the natural connection conveyed by similarity, like the connection between an object and its photo. 

(In contrast, pointing – “ostension” — is based on shared directed attention from two viewers. Pointing alone cannot become category-naming as it is dependent on a shared line of gaze, and on what is immediately present at the time (“context”); it survives in language only with “deictic” words like herethisnowme, which have no referent unless you are “there” too, to see what’s being pointed at!)

A proposition can be true or false, but pointing and miming cannot, because they are not proposing or predicating anything; just “look!”. Whatever is being pointed at is what is pointed at, and whatever a gesture resembles, it resembles. Resemblance can be more or less exact, but it cannot be true or false; it cannot lie. (

Flattering portraiture is still far away in these prelinguistic times, but it is an open question whether iconography began before or after language (and speech); so fantasy, too, may have preceded falsity. Copying and depicting is certainly compatible with miming; both static and dynamic media are iconic.)

It is not that pointing and miming, when used for intentional communication, cannot mislead or deceive. There are some examples in nonhuman primate communication of actions done to intentionally deceive (such as de Waal’s famous case of a female chimpanzee who positions her body behind a barrier so the alpha male can only see her upper body while she invites a preferred lower-ranking male to mate with her below the alpha’s line of sight, knowing that the alpha male would attack them both if he saw that they were copulating).

But, in general, in the iconic world of gesture and pointing, seeing is believing and deliberate deception does not seem to have taken root within species. The broken-wing dance of birds, to lure predators away from their young, is deliberate deception, but it is deception between species, and the disposition also has a genetic component.

Unlike snatch-and-run pilfering, which does occur within species, deliberate within-species deceptive communication (not to be confused with unconscious, involuntary deception, such as concealed ovulation or traits generated by “cheater genes”) is rare. Perhaps this is because there is little opportunity or need for deceptive communication within species that are social enough to have extensive social communication at all. (Cheaters are detectable and punishable in tribal settings — and human prelinguistic settings were surely tribal.) Perhaps social communication itself is more likely to develop in a cooperative rather than a competitive or agonistic context. Moreover, the least likely setting for deceptive communication is perhaps also the most likely setting for the emergence of language: within the family or extended family, where cooperative and collaborative interests prevail, including both food-sharing and knowledge-sharing.

(Infants and juveniles of all species learn by observing and imitating their parents and other adults; there seems to be little danger that adults are deliberately trying to fool them into imitating something that is wrong or maladaptive, What would be the advantage to adults in doing that?)

But in the case of linguistic communication – which means propositional communication – it is hard to imagine how it could have gotten off the ground at all unless propositions were initially true, and assumed and intended to be true, by default. 

It is not that our species did not soon discover the rewards to be gained by lying! But that only became possible after the capacity and motivation for propositional communication had emerged, prevailed and been encoded in the human brain as the strong and unique predisposition it is in our species. Until then there was only pointing and miming, which, being non-propositional, cannot be true or false, even though it can in principle be used to deceive.

So I think the default hypothesis of truth was encoded in the brains of human speakers and hearers as an essential feature of the enormous adaptive advantage (indeed the nuclear power) of language in transmitting categories without the need for unsupervised or supervised learning, instantly, via “hearsay.” The only preconditions are that (1) the speaker must already know the features (and their names) that distinguish the members from the nonmembers of the new category, so they can be conveyed to the hearer in a proposition defining or describing the new category. And (2) the hearer must already know the features of the features (and their names) used to define the new category. (This is  the origin of the “symbol grounding problem” and its solution). 

The reason it is much more likely that propositional language emerged first in the gestural modality rather than the vocal one is that gestures’ iconicity (i.e., their similarity to the objects they are imitating) first connected them to their objects (which would eventually become their referents) and thereafter the gestures were free to become less and less iconic as the gestural community – jointly and gradually – simplified them to make communication faster and easier.

How do the speakers or hearers already know the features (all of which are, of course, categories too)? Well, either directly, from having learned them, the old, time-consuming, risky, impoverished way (through supervised and unsupervised learning from experience) or indirectly, from having learned them by hearsay, through propositions from one who already knows the category to one who does not. Needless to say, the human brain, with its genetically encoded propositional capacity, has a default predilection for learning categories by hearsay (and a laziness about testing them out through direct experience).

The consequence is a powerful default tendency to believe what we are told — to assume hearsay to be true. The trait can take the form of credulousness, gullibility, susceptibility to cult indoctrination, or even hypnotic susceptibility. Some of its roots are already there in unsupervised and supervised learning, in the form of Pavlovian conditioning as well as operant expectancies based on prior experience. 

Specific expectations and associations can of course be extinguished by subsequent contrary experience: A diabetic’s hypoglycemic attack can be suppressed by merely tasting sugar, well before it could raise systemic glucose level (or even by just tasting saccharine, which can never raise blood sugar at all). But repeatedly “fooling the system” that way, without following up with enough sugar to restore homeostatic levels, will extinguish this anticipatory reaction. 

And, by the same token, we can and do learn to detect and disbelieve chronic liars. But the generic default assumption, expectation, and anticipatory physiological responses to verbal propositions remain strong with people and propositions in general  â€“ and in extreme cases they can even induce “hysterical” physiological responses, including hypnotic analgesia sufficient to allow surgical intervention without medication. And it can induce placebo effects as surely as it can induce Trumpian conspiracy theories.