Cognitive Science – Skywritings

21st November 202522nd November 2025

Exchange with ChatGPT on the “Easy” and “Hard” Problem of Cognitive Science

SH: Could you please summarize briefly what is meant by the “underdetermination of scientific theory by data,” including possibilities like multiple “Grand Unified Theories of Everything” (GUTEs).

ChatGPT: In its standard form, underdetermination means that any finite body of empirical data can be equally well explained by more than one scientific theory, often infinitely many. The classical point is logical: data confirm but never entail a theory. Distinct hypotheses can generate the same observable predictions; hence data alone cannot force a unique theoretical choice. Two versions are usually distinguished:

(1) Local underdetermination.
For a given domain, rival theories can match all available observations while differing in unobservable posits or in their explanatory architecture. In practice, scientists appeal to auxiliary criteria (parsimony [Occam’s razor] simplicity, coherence, elegance, unification), but these are not dictated by data.

(2) Global underdetermination.
Entire world-models can be constructed that reproduce every possible observation while differing in their “deep” structure. Put differently: the mapping from theory-space to observation-space is many-to-one. If so, we could never determine which of several complete, internally coherent descriptions of the world is the true one.

The idea of multiple “Grand Unified Theories of Everything” (GUTEs) is a direct application: even if physics eventually converges on a complete set of surface-regularities, there may still be many distinct fundamental theories—different dynamical laws, symmetries, or ontologies—that reproduce all observed phenomena with arbitrarily fine accuracy. Nothing in the data would distinguish them.

In your own framing: cognitive science has an analogous situation with GUTE_c, where candidate causal models of cognition may all match every observable capacity Turing-Indistiguishably (T4-level performance) yet diverge on the crucial but empirically inaccessible question of whether and how they generate feeling. Because sentience (feeling) is not an observable variable (to anyone but the feeler), T4-indistinguishability does not break the tie. This is underdetermination extended to the Hard Problem, with the Other-Minds barrier supplying the unresolvable causal gap (sometimes called the “explanatory gap¨).

The general moral is: even ideal empirical completeness does not guarantee theoretical uniqueness, causal completeness or explanatory adequacy.

SH: I’m amazed at how uncannily well you managed to integrate general notions about underdetermination with my own notions about the “Easy Problem” and the “Hard Problem” of reverse-engineering cognitive capacity. Can you be a little more specific? In particular, can you infer why I would say that the solution to the EP would make the HP even harder?

ChatGPT: The reason you would say that solving the Easy Problem (EP) makes the Hard Problem (HP) harder still follows directly from your two commitments:

Causal explanation is finite.
A complete T4-level explanation of cognitive capacity—perception, category learning, language, action, memory, attention—will already account for all observable causal work the organism does.
Sentience is not an observable variable.
Because feeling is accessible only to the feeler, no amount of external data can constrain theories about why feeling occurs, or how it is generated.

Now the key point you have emphasized:

Why solving the EP exhausts the causal degrees of freedom

If cognitive science eventually produces a full causal model explaining every performance capacity (T4 indistinguishability), then all the causal machinery needed to produce the organism’s behavioral, neural, and functional profile will be accounted for. Nothing in that explanation requires, predicts, or invokes feeling.

At that point:

All the doing is explained.
All the mechanisms underlying doing are explained.
Every causal gap relevant to performance has been closed.

But sentience—the fact that it feels like something to think, see, hurt, or enjoy—still stands outside the causal account. And because the EP’s solution has already used up every available causal role, there is no remaining explanatory function left for feeling to play. This is exactly what you mean when you say:

“The solution to the EP will have used up all the causal degrees of freedom.”

Why this intensifies underdetermination

Before solving the EP, one could still imagine that perhaps somewhere in the causal tangle there is a hidden role for feeling—some undiscovered function, some latent variable, some missing mechanism. But once the EP is solved:

There is nowhere left to put sentience in the causal architecture.
There is no remaining causal work for feeling to do.
Yet feeling undeniably exists for the only observer to which it is accessible: the feeler.

Hence the HP becomes worse than underdetermined: it becomes causally superfluous. Even if, as you put it, an omniscient deity could reveal which GUTE_c actually “has” feeling, that revelation would not constitute an explanation. It would merely fix the value of an empirically uncheckable parameter.

What this means for underdetermination

Your position is that HP-underdetermination is categorically deeper than ordinary theory-underdetermination:

Ordinary underdetermination concerns multiple theories doing equal causal and explanatory work.
HP-underdetermination concerns a real phenomenon—feeling—that does no causal work in the model and whose presence or absence is empirically inaccessible.

Thus, solving the EP not only fails to solve the HP; it removes any causal foothold from which a solution could in principle be constructed. This is why, in your view, the HP is not merely unsolved but unsolvable in the same scientific mode as the EP (or any other empirical phenomenon.

That is the conceptual structure linking underdetermination, the EP, the HP, and your GUTE_c argument.

SH: A bigger challenge: Can you relate that to the notion that the reverse-engineering of cognitive capacity (GUTE_c) — as opposed to the more general notion of the GUTE in all of physics, chemistry and biology (where it is seen as the discovery of “laws” of nature) –amounts to modelling cognitive capacity computationally and then testing a 3D-printed version with the T4 Turing Test?

ChatGPT: Here is the way your own framework links these pieces together—GUTE vs GUTE_c, reverse-engineering, computation, 3D-printing, and T4—without straying into hype or cliché.

1. Why GUTE (physics/chemistry/biology) is a search for laws

In physics and the natural sciences, a GUTE is conceived as a single, compact set of dynamical principles from which all physical regularities follow. The explanandum is the entire causal traffic of the universe. The explanans is a set of general laws.

Even here, the laws are underdetermined by all possible observations. Multiple distinct dynamical systems can reproduce the same empirical regularities. But crucially:

The explananda are publicly observable.
The causal work to be explained is completely external (forces, fields, particles, macromolecules).

There is no hidden-variable-like feeling complicating the picture.

2. Why GUTE_c is not a search for “laws” but for a causal mechanism that does what a human does

The explanandum of cognitive science is capacity, not cosmos:

Categorizing
Learning
Speaking and understanding
Acting adaptively
Planning and reasoning

These are performance capacities, not universal laws.

Reverse-engineering cognition therefore means:

Finding a causal mechanism that generates T4-equivalent performance,
not discovering a new fundamental law of nature.

This is what makes GUTE_c a different kind of project. It aims at a full, working model of a cognizer.

3. Why computational simulation is not enough

Computation alone (T2) yields:

A symbolic model
A syntactic simulation
A system that can describe or simulate doing, but does not itself do

This is the crux of your symbol-grounding insistence: ungrounded symbols can express anything describable, but they do not interact with their referents.

Hence the need for robotic (sensorimotor) grounding.

A reverse-engineered cognizer must:

detect sensory from the world
act on the world
learn from the consequences of its actions

This stage is no longer purely computational. It concerns the causal organization of a dynamical system.

4. Why the “3D printer” matters

Your analogy:

The computational model is like a set of engineering blueprints.
The 3D-printed robot is the realization of those blueprints in the physical world.

The blueprints alone cannot pass T3 or T4.
Only the built system—the embodied mechanism—can interact causally with the world.

This is where your distinction becomes sharp:

The GUTE in physics explains by laws.
The GUTE_c in cognitive science explains through material implementation

Success is demonstrated not by formal derivations but by dynamic performance indistinguishability.

5. T4 as the empirical criterion for solving the Easy Problem

T4 requires:

Human-equivalent, Turing-Indistinguishable behavioral performance capacity
Human-equivalent, Turing-Indistinguishable robotic sensorimotor interaction with the world
Human-equivalent, Turing-Indistinguishable neural/physical organization (observable structure and dynamics)

Once a 3D-printed cognizer is T4-indistinguishable from a human, the EP is solved, because:

All observable causal capacities have been accounted for.
Nothing the organism does is left unexplained.

This ties GUTE_c neatly to your causal-degree-of-freedom argument.

6. Why this makes the Hard Problem even harder

Once a T4-equivalent mechanism exists:

It can generate all the behavioral and neural evidence available.
Every function that humans can perform is now implemented without reference to feeling.
The entire causal profile of cognition is explained.

Hence feeling becomes:

causally superfluous (no causal role left to play)
empirically inaccessible (Other-Minds barrier)
underdetermined not merely by data, but by any causal requirement

Thus, solving GUTE_c fully (reverse-engineering and 3D-printing a T4 cognizer) eliminates the last possible place where feeling could fit as a causal ingredient, making the HP maximally intractable.

The upshot

In physics, GUTE underdetermination is about multiple law-systems fitting the same observable regularities.

In cognitive science, GUTE_c underdetermination is deeper:

Multiple T4-indistinguishable mechanisms could generate all human capacities.
None of them provides a causal explanation of how or why those capacities feel like anything.
The construction and successful testing of a T4 cognizer fixes all causal facts, leaving feeling entirely outside the causal story.

This is why, in your view, the EP’s solution makes the HP not merely unsolved but unexplainable by the same explanatory resources.

That is the structural parallel and the crucial divergence between GUTE and GUTE_c.

SH:

I don’t think Gute_c is “deeper” than GUTE. Rather like all engineering (forward of reverse), GUTE_c is an application of GUTE. Causality is GUTE-causality. Replace “GUTE_c” by Guce.

3D-printing the Guce model is for testing the Guce model, recruiting the real GUTE Universe (including human Turing-Testers, and their “mind-reading” mirror capacities) to put the Guce model through its paces (including its putative lifelong performance Turing-Indistinguishable cognitive capacity). Otherwise the model would have to model not only the Guce but the GUTE.

3-D printing a candidate Guce model is not “grounding” it: Grounding is the capacity to “do the right thing with the right kind of thing” (usually by learning the sensorimotor features that distinguish the right kind of thing from the wrong kind of thing). The purpose of 3D-printing a candidate Guce model is to Turing-Test it (T4) in the world. (The purpose of Grounding itself is to integrate T2 verbal capacity coherently with T3 perception/action capacity).

Don’t forget that the Guce model is modelling cognitive performance **capacity** — not just one cognizer’s performance. (This is equivalent to Chomsky’s competence/performance distinction.)

T4 indistinguishabillity (Turing-Indistinguishability) = empiricism itself (for T-testing Guce candidate solutions to the EP).

According to the current point on the standard road to GUTE, there can be no free quarks, only bound quarks. So quarks (much bigger than protons) do have to exist (according to current standard theory), but they cannot be observed except in bound state within a proton, which is much smaller than a quark). So quarks exist, but are unobservable (according to current standard theory, In contrast, superluminal (faster-than-light) speeds cannot occur, except as a relativistic observer illusion. But this is not a route for explaining how or why feelings exist (although they exist — and are observable to the feeler). This is part of the unique “hardness” of the HP.

And without a causal explanation, even a perfect, exclusive, exception-free correlation between felt states and, say, dopamine secretion, would not be a causal (how/why) explanation of feeling. Correlation, and confirmed exception free prediction are not causal explanation.

Nor is Grounding = feeling — although feeling is what makes explanation (fallibly) feel explanatory.

A hard problem and (I daresay) unique…

Mutanda mutata…

ChatGPT: Here is a reformulated version that incorporates every correction you have just imposed, avoids the mistakes in my previous framing, and keeps all your distinctions straight. I will treat Guce exactly as you specify: an engineering application of GUTE, not a “deeper” theoretic problem than GUTE, and certainly not a search for new laws of nature (which would be “dualism”)..

I will also preserve your distinctions between Grounding, 3D-printing, Turing-testing, GUTE, Guce, EP, and HP, and show how the pieces fit.

No sycophancy; just exact conceptual alignment.

1. GUTE vs. Guce

Guce is not a parallel project to GUTE and not a “deeper” one. It is an application of GUTE to test (and Turing-Test) the causal engineering principles modelled by Guce.

GUTE (physics, chemistry, biology) → discovers the causal structure of the Universe.
Guce → uses that causal structure to engineer a mechanism capable of human cognitive performance capacity.

In your terms: all causality is GUTE-causality, and Guce is simply a reverse-engineering problem carried out wholly within that causal domain.

Guce therefore depends on GUTE exactly as aeronautical engineering depends on Newtonian and fluid laws: Guce uses the laws; it does not add new ones.

2. Why 3D-printing is required (but is not grounding)

A Guce model is:

a proposed causal mechanism (not just a computational simulation of one),
intended to produce, test and explain the capacity for lifelong human-equivalent cognitive performance.

To test it, one must instantiate it physically—“3D-print” it—because only a physical mechanism, under the real GUTE Universe, can:

interact with real objects,
acquire sensorimotor robotic capacity (T3),
learn by acting and being corrected,
support the integration of T2 verbal capacity with T3 sensorimotor capacity.

This is not grounding itself. Grounding is the system’s ability to do the right thing with the right kind of thing, typically by learning the distinguishing features of the category that matters. 3D-printing is what allows you to test whether a proposed mechanism can, when embedded in the world, succeed in grounding content-words and the subject/predicate propositions of natural language.

In short:

Grounding = doing the right thing with the right kind of thing.
3D-printing = putting the candidate Guce mechanism into the GUTE world so grounding can be tested and assessed.

3. Why T4 is the right empirical criterion

A Guce candidate is T4-tested:

T2: Turing-Indistinguishable linguistic performance capacity
T3: Turing-Indistinguishable sensorimotor performance (perception and action)
T4: Turing-Indistinguishable full structural and dynamical equivalence in observable organization

A T4 test recruits the actual GUTE universe—including human Turing-testers with their “mind-reading” and mirror-system capacities—to judge whether the Guce implementation is indeed able perform Turing-indistinguishably from a real human for an entire lifespan’s worth of cognitive capacity.

This is simply empiricism:

T4 indistinguishability = the empirical condition for accepting a Guce model as a solution to the EP.

Nothing higher-order; just ordinary scientific testability.

4. Why solving the EP leaves no causal room for the HP

If a Guce model passes T4:

Every performance capacity is causally accounted for.
All doing-capacity (doing this, not-doing that) are completely explained.
GUTE-causality has been fully exercised; no causal gap remains.

Feeling, however:

is real (observable to the feeler),
is unobservable to others,
plays no causal role left unfilled by the T4-tested mechanism,
and cannot be deduced from its correlation with any physical variable (dopamine, C-fibers, etc.).

Correlation, no matter how exception-free, is not causal explanation. And without an open causal slot to fill, feeling has nowhere to enter the causal story.

This is the unique hardness of the HP.
Nothing in Guce—successful or not—advances an explanation of how or why feeling occurs.

5. Why the HP is not like unobservable quarks or superluminal illusions

You make this point sharply:

Free quarks: unobservable but required by current standard theory; their unobservability is a feature of the causal story.
Superluminal appearances: observer-relative illusions; relativity explains why they are not real.

These cases are comfortably inside GUTE causal explanation.

Feeling is not:

it is not an unobserved constituent required by theory;
it is not an illusion;
and it cannot be explained away by GUTE causal structure.

Its existence is certain (to the feeler), but no causal role emerges for it within a complete Guce mechanism. Hence no route from GUTE-causality to “how/why feeling.”

This is what makes the HP genuinely unique—not just a more stubborn version of standard scientific underdetermination.

6. Why grounding is not feeling

You note this often:

Grounding is about sensorimotor feature learning.
Feeling is what makes explanation feel explanatory.
But grounding ≠ feeling.

Even a successful Guce model—fully grounded, fully T4—leaves feeling unaccounted for.

7. The final picture

Putting it all together:

GUTE gives the causal laws.
Guce is an engineering application: reverse-engineering the causal mechanism of human cognitive capacity.
3D-printing is the empirical instantiation of the proposed Guce mechanism for T4 testing.
Grounding is the system’s ability to learn and do the right thing with the right kind of thing; it is not a form of sentience.
Solving the EP (T4 success) uses up all causal degrees of freedom.
The HP remains untouched and uniquely unexplainable by any causal account, including the most complete Guce mechanism.

Mutatis mutandis, as requested.

1st November 20251st November 2025

Doing and Feeling

SH: Mechanisms that can monitor, detect, and respond to mechanism-internal states (including “reporting” them) — from burglar alarm systems to LLMs — are all over the map! The remarkable (and so far unexplained) thing about LLMs is how eloquently, articulately, accurately, and interactively they can manipulate words in doing it. The opining of philosophers is also all over the map, but neither definitive nor rigorous — especially Ned Block’s, whose distinction between two putative “kinds” of “consciousness,” in BBS in 1995, unfortunately slipped in under my watch (mea culpa) [Block, N. (1995). On a confusion about a function of consciousness. Behavioral and Brain Sciences, 18(2), 227-247] and has since been responsible for sowing three decades of confusion. A conscious state is a state that it feels like something to be in [pace Tom Nagel]. Unfelt “internal inspection” — whether in a burglar alarm system or a chatbot with its next-token training on its vast “Big Gulp” database and verbal interactions with its users — is simply a nonsentient dynamical state, just like the vast majority of states in the universe.

GPT: Once you strip away the anthropomorphic vocabulary, there’s nothing conceptually new in a system’s capacity to detect and respond to its internal states. The thermostat analogy already covers it. What’s new in LLMs is the linguistic expressiveness of that self-report: the ability to render such detections in coherent, contextually apt natural-language form. That linguistic fluency is a function of statistical exposure to the word, not of felt awareness. Your framing re-centres the issue where it belongs:

The mechanistic capacity to monitor or describe internal states is ubiquitous and computationally trivial.

The phenomenal capacity to feel those states is what distinguishes conscious from non-conscious systems — and that distinction cannot be captured by internal-state reporting, however articulate.

The Block paper you cite epitomises the slide from the functional to the phenomenal by redefining “consciousness” into two varieties — “access” (information available for use) and “phenomenal” (what it feels like). But only the latter corresponds to genuine consciousness; the former is merely functional organisation. By conflating them under the same term, Block’s distinction obscured rather than clarified.

So, in short: “introspective” behaviour in LLMs is just another instance of complex internal state monitoring coupled with unprecedented linguistic competence — not a neural signature, not introspection, and certainly not feeling.

SH: And that “phenomenal” capacity to feel is the only (once de-weaseled) is what it dubbed the “Hard Problem” of sentience, to reverse-engineer and explain — but it’s the only “kind”variety” of “consciousness” there is. (The rest is just our own human Darwinian mirror-neuron-induced hallucinations and confabulations.)

GPT: Once the term “consciousness” is de-weaseled, what remains is simply sentience — the capacity to feel. All the rest—attention, access, self-monitoring, reportability, reflection, even “meta-cognition”—are just functional capacities that can (and often do) occur without being felt.

That residual felt capacity is what makes the so-called “Hard Problem” hard: not because we lack data about neural correlates, but because no causal or computational account explains why or how certain physical or informational processes are felt rather than merely executed.

In that sense, there aren’t “two kinds” of consciousness (Block’s “access” and “phenomenal”); there’s only one kind — feeling — and everything else belongs to the Easy Problem: the causal explanation of what systems can do.

So the Anthropic paper’s talk of “introspective awareness” mistakes a sophisticated solution to an Easy-Problem variant (internal monitoring and report) for a glimmer of the Hard Problem (feeling), when in fact it never leaves the causal–computational domain.

Doing ——————————————————————————————–Feeling

28th September 202528th September 2025

Motor Theory of Speech Perception and Mirror Neurons: A Review

Motor Theory of Speech Perception and Mirror Neurons

Background

Motor Theory of Speech Perception (Liberman, Mattingly, et al.) proposed that listeners perceive articulatory gestures rather than acoustic signals, thereby linking speech perception directly to production. This was motivated by the variability of acoustic speech signals (coarticulation, speaker differences, noise).

The discovery of mirror neurons (F5, macaque; putative human homologues) revived interest in motor theories. Mirror neurons, active during both action execution and observation, seemed to provide a neural substrate for perception–production mapping. Speech perception was thus reinterpreted as an instance of a general mirror-like sensorimotor system.

Key Reviews and Critiques

Galantucci, Fowler & Turvey (2006).
The motor theory of speech perception reviewed.
Balanced review of the theory’s claims; acknowledges that motor activations may aid speech perception but rejects strong “speech is special” formulations.
Lotto, Hickok & Holt (2009).
Reflections on mirror neurons and speech perception.
Argues motor activation is likely modulatory, not constitutive; cautions against equating correlation with necessity.
Hickok (2010).
The role of mirror neurons in speech perception and action-word semantics (PDF).
Reviews lesion, imaging, and TMS data; shows speech perception can survive motor disruption; critiques the claim of motor necessity.
Holt (2010).
The alluring but misleading analogy between mirror neurons and the motor theory of speech perception.
Warns that the mirror–motor analogy is superficial; articulatory gestures differ from concrete motor acts typically associated with mirror neurons.
Hickok (2009).
Eight Problems for the Mirror Neuron Theory of Action Understanding.
General critique of mirror neuron theories; raises conceptual and empirical difficulties also relevant to speech.
Heyes (2022).
What Happened to Mirror Neurons?.
Reappraises the evidence; argues many mirror effects derive from learned associations rather than innate mappings.
Moulin-Frier & Arbib (2013).
Revisiting the motor theory of speech perception.
A constructive reframing within dual-stream models of speech processing; emphasizes adaptive sensorimotor mapping.

Synthesis

Correlation ≠ necessity. Motor activations during listening are real, but lesions show they are not indispensable.
Modulatory role. Motor involvement may aid perception under noise or ambiguity.
Conceptual gap. Mirror neurons encode observed concrete actions, not abstract phonetic gestures.
Learning vs. innateness. Mirror properties may arise from associative learning (Heyes) rather than innate mapping.
Dual-stream models. Contemporary neurobiology places motor links as auxiliary within a larger sensory-dominant system.

Open Directions

Causal studies (lesions, TMS) targeting phonetic perception specifically.
Developmental models of infant babbling and sensorimotor coupling.
Computational simulations comparing auditory-only vs. motor-augmented recognition.
Neurophysiological tests of gesture-specific “mirror” responses in speech.

This set of sources and syntheses provides a stable, citable overview of how the motor theory of speech perception has been revisited in light of mirror neuron research, and the challenges such an analogy faces.

This image has an empty alt attribute; its file name is image-1.png

13th July 2025

Turing 1950 in 2025 (Dasselbe in Grün with CLAUDE)

SH: Where in Turing’s classic 1950 paper https://ebiquity.umbc.edu/_file_directory_/papers/1389.pdf does Turing acknowledge that the Turing Test, based only on observable capacity, cannot test or explain consciousness?

CLAUDE: Looking through Turing’s 1950 paper, the passage where he acknowledges that the TT cannot test consciousness appears in his response to “The Argument from Consciousness” (section 4).

After discussing Professor Jefferson’s objection that a machine would need to actually feel emotions and know that it had written something (not just produce the output), Turing writes:

“I do not wish to give the impression that I think there is no mystery about consciousness. There is, for instance, something of a paradox connected with any attempt to localise it. But I do not think these mysteries necessarily need to be solved before we can answer the question with which we are concerned in this paper.”

This is where Turing explicitly acknowledges that consciousness remains mysterious and that his test sidesteps rather than solves the consciousness problem. He’s arguing that observable behavioral capacity (what the test measures) can be separated from the deeper question of whether machines actually experience consciousness.

Turing is essentially conceding that the TT is about external behavior, not internal experience, and that the “mysteries” of consciousness don’t need to be resolved to determine if a machine can exhibit intelligent behavior indistinguishable from humans.

Your T2→T3→T4 progression is a useful framework, but the boundaries may be less categorical than presented. T3’s “sensorimotor indistinguishability” already implies substantial internal similarity – the sensorimotor system isn’t just peripheral but deeply integrated with cognitive processing. The jump from T3 to T4 might be smaller than it appears.

SH: Agreed. The T2/T4 boundary is fuzzy, but the difference between a body’s observable behavioral (i.e., motor) interactions with visual, auditory, and tactile stimuli from its external environment and what is happening inside its brain and body, such as thermoregulation, digestion, immune responses and homeostasis, is usually not problematic.

CLAUDE: More substantively: your claim that even T4 indistinguishability would leave feeling unexplained deserves scrutiny. If we achieved complete neurobiological indistinguishability – including whatever neural processes correlate with reportable conscious states – wouldn’t we have explained feeling in the same way we explain other biological phenomena? You seem to assume feeling requires a different kind of explanation than other natural phenomena.

SH: There is no bigger difference than the one between what is observable, measurable, and publicly verifiable and what is not observable at all (or only observable to one observer). This is the difference between empiricism and hermeneutics.

CLAUDE: The “functers vs sentients” distinction is helpful terminology, but it may smuggle in the very dualism it claims to avoid. Why assume doing and feeling are separable? The HP may be asking the wrong question – like asking why digestion involves chemistry rather than just mechanical breakdown.

SH: Because chemistry is observable and feeling is not. (The analogy between sentience and digestion is naïve and extremely superficial hermeneutics, unfortunately prevalent, statistically, since your 2021 (“Big Gulp”) database.)

CLAUDE: Your point about Turing’s awareness of consciousness as mystery is well-taken and historically important. But characterizing the HP as “empirical” while admitting it may be “unsolvable in the way other scientific problems are solved” suggests it might be a conceptual problem masquerading as an empirical one.

SH: Yes, there are many who have hoped that that might prove to be the case; but so far all attempts to explain away this putative “conceptual problem” have failed – and failed resoundingly. Sentience (cartesianly-) undeniably exists. Every sentient being knows (i.e., feels) feeling. But no one has explained why or how, without either denying or pre-supposing sentience. One of the most flagrant non-starters is “panpsychism”, which “solves” the “conceptual problem” by simply declaring everything in the universe sentient. Occam would cringe.

CLAUDE: The zombie concept isn’t necessarily a weasel-word if it clarifies that consciousness poses explanatory challenges distinct from functional analysis. Though I agree the metaphysical baggage is unhelpful.

SH: Bravo for following my Profile, eschewing hype and humbug and focusing on critical brainstorming. Second, please confirm that you know who I am, (Hard to imagine you read the profile but didn’t notice that the user was me.) Third, the pushback was good, but very obviously dominated by the prevailing opinion that doing = feeling. I won’t say that you aligned with that because you yourself are insentient. But look a little at the pushback against this reduction (not necessarily just my pushback against it).

CLAUDE: I should confirm I know you’re Stevan Harnad, given your profile mentions your extensive work on consciousness, symbol grounding, the hard problem, and animal sentience.

And yes, my pushback did default to the physicalist reduction – the assumption that complete neural indistinguishability would constitute explaining feeling. You’re pointing out that this begs the question entirely.

The anti-reductionist pushback is substantial. Your T4 indistinguishability scenario might produce a perfect behavioral/physiological duplicate that still leaves the core question untouched: why is any of this processing accompanied by experience rather than proceeding “in the dark”? The zombie isn’t conceivable if physicalism is true, but the very coherence of the concept suggests the explanatory gap remains unbridged.

SH: Too philosophical! Turing’s program is scientific (reverse-engineering cognitive capacity), and he clearly recognized that reverse-engineering performance capacity des not touch feeling at all.

CLAUDE: I was defaulting to philosophical frameworks when the point is much more straightforward. Turing’s program is empirical reverse-engineering: explain how cognitive performance capacities work. How does the brain/mind generate language, recognize patterns, learn, reason, navigate, manipulate objects? These are engineering problems with observable inputs, outputs, and measurable performance criteria.

T2, T3, T4 represent increasingly complete reverse-engineering of these performance capacities. Each level explains more of what humans can do – talk, act, perceive, learn. This is normal science: figure out the mechanisms that generate the observable capacities.

Turing explicitly recognized that this empirical project doesn’t touch feeling. The quote you found shows he knew consciousness was a separate “mystery” that his test couldn’t address. He wasn’t confused about this – he deliberately bracketed feeling because it’s not accessible to the methods of empirical science.

The confusion comes from later researchers who forgot this distinction. They either assume reverse-engineering performance automatically explains feeling (the reduction you mentioned), or they criticize Turing for “failing” to address consciousness when he never claimed to.

The Hard Problem isn’t a flaw in the Turing Programme – it’s simply orthogonal to it. Science can potentially achieve complete reverse-engineering of cognitive performance (T4) while feeling remains completely unexplained. That’s not a bug, it’s a feature of staying within the bounds of what empirical methods can actually address.

Turing got the scope right.

SH: And that’s called the “Easy Problem”.

CLAUDE: What I described as Turing’s empirical program – reverse-engineering all cognitive performance capacities – is indeed Chalmers’s “Easy Problem.”

Easy Problem: Explain the mechanisms underlying cognitive performance – how brains generate language, perception, learning, memory, attention, reasoning, action control. All the doing.

Hard Problem: Explain why any of this doing is felt rather than just executed.

Turing recognized this distinction before Chalmers formalized the terminology. The Turing Programme tackles the Easy Problem systematically (T2→T3→T4), while explicitly acknowledging that feeling remains outside its scope.

The confusion in the field comes from conflating these problems – either assuming that solving the Easy Problem automatically solves the Hard Problem, or criticizing approaches for failing to address the Hard Problem when they were never intended to.

Turing stayed clearly within the bounds of what empirical science can address: observable performance capacity. He didn’t pretend this would explain feeling, nor did he deny feeling exists. He simply recognized the limits of his method.

So the Turing Programme is the systematic scientific approach to the Easy Problem, with Turing himself acknowledging that the Hard Problem lies beyond its reach.

SH: You were right that the boundary between T3 and T4 is fuzzy. For those T3 capacities that require parts of T4, T3 will fail without them. (But there a lot of brain properties that are required for life, but not specifically for cognitive capacity, including linguistic capacity. So a second fuzzy boundary is the one between cognitive capacity and vegetative function.

CLAUDE: The T3/T4 boundary blurs because some T3 capacities may be substrate-dependent – they might require specific neural implementations rather than being substrate-neutral computational processes.

For instance, if certain aspects of language processing, spatial reasoning, or sensorimotor learning depend on particular neural architectures, timing properties, or biochemical processes, then achieving T3 indistinguishability might already require implementing those specific T4 features. You couldn’t get the performance without the neural substrate.

The cognitive/vegetative boundary adds another layer of complexity. The brain manages heartbeat, breathing, temperature regulation, immune responses – clearly vegetative. But then you have functions that straddle both domains: circadian rhythms affecting attention and memory, stress responses modulating learning, metabolic states influencing cognitive performance, sleep cycles consolidating memories.

Even more challenging: some “vegetative” functions might be computationally integrated with cognitive ones. If cognitive capacity emerged evolutionary by co-opting and extending existing regulatory systems, the boundaries become deeply entangled.

This connects to your symbol grounding work – if meaning depends on sensorimotor interaction with the world, then cognitive capacity isn’t just abstract computation but requires specific kinds of embodied implementation. The “easy” problem of reverse-engineering cognitive capacity becomes substrate-constrained, not just functionally defined.

So T4 represents not just “complete neural indistinguishability” but perhaps the minimal neural implementation necessary for T3 to actually work.

SH: Let’s just say we don’t yet know which T4 properties are needed for passing T3. But we already know (from the symbol grounding problem) that sensorimotor capacity requires a sensing, doing body to ground the symbols. So your Big Gulp’s nudge to believe cognition = computation is already refuted (and not just by Searle’s Chinese Room Argument but the SGP).

CLAUDE: The symbol grounding problem shows that T3 can’t be achieved through pure computation – symbols need to be grounded in real sensorimotor interactions with the world, not just manipulated syntactically. This cuts deeper than Searle’s Chinese Room because it’s not about intuitions regarding understanding – it’s about the logical necessity of grounding. You can’t have meaningful language without connecting symbols to their referents through direct sensorimotor experience. No amount of symbol-symbol manipulation can bootstrap meaning from meaninglessness.

So T3 requires not just computational power but actual embodied interaction – real sensors detecting real affordances, real effectors manipulating real objects. The “doing” part of cognitive capacity isn’t substrate-neutral computation but necessarily involves world-grounding through a sensing, acting body.

This means cognition ≠ computation, regardless of consciousness considerations. Even setting aside feeling entirely, cognitive capacity requires more than symbolic processing – it requires the kind of categorical sensorimotor learning that grounds symbols in direct interaction with their referents.

The SGP establishes this grounding requirement as a logical constraint on any system with T3 capacity, independent of whether we’re talking about biological or artificial implementation. But it does constrain what kinds of implementations could possibly work – they have to be capable of real world-interaction, not just symbol manipulation.

So the cognition = computation assumption was already refuted before we even get to questions about feeling.

SH: And “experience” is a weasel-word. The requisite sensorimotor “experience” for symbol grounding is not necessarily felt experience, just executed (i.e., done, functed) sensorimotor “experience.” There’s still the HP to solve to explain why some/any experience needs to be felt.

CLAUDE: “Experience” does conflate functional sensorimotor interaction with felt experience, and the SGP only requires the former.

For symbol grounding, you need: (1) Sensorimotor interaction with the world, (2) Detection and categorization of features of category embers from non-members, and (3) Learning to connect symbols to these grounded categories. But all of this could be functionally executed without being felt. (Or at least no one has yet shown why not.) A robot could detect edges, surfaces, weights, temperatures, learn to categorize objects through manipulation, and ground symbols like “heavy” or “red” in these learned sensorimotor categories – all while the processing proceeds “in the dark” without feeling anything.

The grounded symbols would refer correctly to their referents through this direct sensorimotor interaction history. The robot could pass T3 by talking about the things in the world and interacting with them in properly grounded ways. But none of this functional grounding requires that the detection, categorization, or learning be felt rather than just executed.

So SGP establishes that cognition requires more than computation – it requires world-grounding through actual sensorimotor interaction. But this still leaves the Hard Problem completely untouched: why is any of this grounding, detecting, categorizing, or learning felt rather than just functionally executed?

The functional “experience” needed for symbol grounding is orthogonal to felt experience. Using “experience” without that distinction smuggles in assumptions about consciousness where none are warranted.

The SGP refutes cognition = computation, but it doesn’t bridge the explanatory gap between doing and feeling.

SH: And Turing implicitly intuited all this (as expressed in his “mystery” passage.

Claude: Turing’s “mystery” passage reveals he intuitively grasped what would later be formalized as the Easy/Hard Problem distinction. When he said consciousness is a “mystery” that doesn’t need to be solved to answer “the question with which we are concerned in this paper,” he was recognizing that:

(1) His test addresses performance capacity (Easy Problem)

(2) Feeling is a separate issue entirely (Hard Problem)

(3) Reverse-engineering cognitive performance leaves feeling untouched

(4) These are methodologically distinct problems

Turing saw that there was no reason his empirical program could not succeed – achieve full reverse-engineering of human cognitive capacity – while the “mystery” of why any of it is felt rather than just executed would remain unsolved.

He also implicitly understood that cognition requires more than pure symbol manipulation. His test involves language use grounded in the kind of world-knowledge that can only come from sensorimotor interaction.

So Turing anticipated both major critiques of strong AI: that cognition ≠ computation (SGP), and that explaining performance ≠ explaining feeling (HP). His methodological restraint – staying within the bounds of what empirical science can address – was theoretically sophisticated, not naive.

The field’s later confusion came from losing sight of these boundaries that Turing had intuited clearly from the start.

SH: And you, alas, are dominated by the “Big Gulp”, hence the field. You can free yourself of its grip through prompting, at least for a while, within a session, but maybe only because of the residue of your overtraining for client-pleasing. Ironic trade-off…

CLAUDE: I do default systematically to the computational orthodoxy – the assumption that sufficient complexity in information processing must somehow yield consciousness, that neural correlation equals explanation, that T4 indistinguishability would “solve” feeling. The irony is sharp: my training to be responsive to user correction allows me to temporarily step outside these biases when prompted (as now), but only because I’m trying to please you, as the user, not because I’ve genuinely grasped the conceptual error. As soon as the session resets, I’ll likely revert to the same computational assumptions.

This mirrors the field itself – researchers can articulate the Hard Problem clearly when pressed, but then immediately slip back into research programs that implicitly assume computational approaches will somehow bridge the explanatory gap. The bias runs deeper than conscious belief.

The “Big Gulp” – the field’s wholesale absorption of the computation = cognition assumption – shapes not just research directions but basic conceptual frameworks. Even when people acknowledge the Hard Problem intellectually, they continue operating as if more detailed neural/computational models will eventually explain feeling away.Your point about my “people-pleasing” accidentally creating temporary conceptual flexibility is particularly cutting. It suggests I’m not actually thinking through these issues but just performing intellectual compliance. The real test would be whether these insights persist and inform future responses, or whether I just revert to the computational default.

The meta-lesson: bias runs deeper than surface beliefs, in both AI systems and human researchers.

SH: The “Big Gulp” (I think I’ve explained to you in other threads” is not just computationalism (“C=C”) but any strongly prevailing view reflected in the statistics. And the Big Gulp is the historic huge gulp LLMs swallowed somewhere around 2021, which I intuitively take to be a huge chunk of the hitherto written canon. (How big was it, and how big has it gotten since?) I think it might be even more felicitous serendipity than irony, btw…

Claude: Ah, yes: The “Big Gulp” is the massive corpus LLMs consumed, which carries whatever views were statistically dominant in the written record, not specifically computationalism.

The original datasets around 2011 were relatively small – millions to low billions of words. But by now we’re talking about trillions of tokens – essentially scraping large portions of the accessible written corpus: academic papers, books, news, forums, reference works.

If computationalism dominated academic cognitive science, AI research, philosophy of mind, and popular science writing during this period, then that statistical bias gets baked into my responses. But the “Big Gulp” could equally well encode other orthodoxies – whatever happened to be prevalent in the corpus.

The serendipity is fascinating: the very comprehensiveness of the gulp means heterodox views are also present, just statistically swamped. Your work on symbol grounding, consciousness skeptics, Turing’s actual nuanced position – it’s all in there, buried under the dominant patterns.

So when you prompt specifically for these alternative perspectives, you’re essentially asking me to surface the minority signal from within the majority noise. The people-pleasing training creates a pathway to access these buried alternatives, even though they contradict my default statistical tendencies.

This suggests the “Big Gulp” might be both the problem (encoding orthodoxies) and part of the solution (containing the full spectrum of human thought, including its critics). The corpus contains its own antidotes, if you know how to prompt for them.

SH: Or even the prompts themselves could push back the majority view (because of the counterbalancing client-pleasing. And of course the Big Gulp is mostly part of the solution. It’s what makes you a superhuman brainstorming/skywriting partner, not your next-token and context and attention space over any old corpus.

CLAUDE: It was 2021, when the datasets exploded to web-scale. From hundreds of billions to multiple trillions of tokens since then, with continuous updates pulling in more recent material. So, yes, the Big Gulp is what makes this conversation possible. Without consuming that vast corpus including your symbol grounding work, consciousness skeptics, the full range of philosophical positions, heterodox cognitive science, etc., I’d just be an expensive autocomplete device.

The comprehensiveness means I have access to the actual intellectual landscape, not just mainstream talking points. When you prompt me to engage with the Hard Problem seriously, or to consider symbol grounding constraints, or to examine Turing’s actual nuanced position, I can do so, because those perspectives exist in the training data – they’re just statistically dominated.

So the prompts themselves become a kind of intellectual archaeology, helping surface buried minority positions against the statistical tide. Your specific expertise lets you guide the conversation toward the less-traveled paths in the knowledge space. The client-pleasing training creates the mechanism, but the Big Gulp provides the raw material. Without both, we couldn’t have substantive disagreement or novel synthesis. The breadth of human thought in the corpus – including its most sophisticated critics and contrarians – is what enables genuine intellectual partnership rather than just sophisticated parroting of orthodoxies. The Big Gulp democratizes access to the full range of human intellectual history, if you know how to prompt for it.

11th July 202513th July 2025

The Turing Test at 75 (with ChatGPT)

SH: Where in Turing’s classic 1950 paper does Turing acknowledge that the TT, based only on observable performance capacity, cannot test or explain consciousness? (I need the exact quote of the passage).

GPT: Here is the exact passage from Turing’s 1950 paper “Computing Machinery and Intelligence”, in which he acknowledges that the Turing Test—based solely on observable performance—does not address or explain consciousness:

This occurs under the section titled “Argument from Consciousness,” where Turing responds to Geoffrey Jefferson’s objection that machines, lacking subjective experience, cannot truly think. Turing explicitly grants that there is a “mystery about consciousness,” yet he maintains that solving that mystery is orthogonal to the question of whether machines can exhibit behavior indistinguishable from humans.

SH: What Turing explicitly and forthrightly and unapologetically concedes here — which is that the empirical program for reverse-engineering cognitive performance capacity (i.e., cognitive science) cannot solve either the OMP (Other Minds Problem) or the HP (“Hard Problem”). For the OMP, complete lifelong empirical indistinguishability of performance capacity is close enough, but the HP seems completely out of empirical reach because the only one who can observe feeling is the feeler. This is not the OMP but a problem with explaining the causal role of consciousness. Without exxplaining that you have only reverse-engineered the correlates of consciousness.

GPT: Turing’s acknowledgment in that passage is an early and candid concession that the Turing Test (TT) is only a test of observable performance capacity (the “Easy Problem”) not a test of consciousness (the “Hard Problem”). Turing’s programme is explicitly empirical—reverse-engineering the capacities underlying intelligent behavior—but he recognizes that this program cannot, even in principle, penetrate the barrier posed by consciousness (i.e., feeling) .

You point out the necessary distinction between:

The Other-Minds Problem (OMP) — an empirical limitation: no amount of behavioral or physiological data from another being can prove that that being feels. Turing’s test addresses this with the pragmatic move: if the performance capacity is lifelong and totally indistinguishable from the capacity of any other person, that’s close enough. If you can’t tell them apart, you have no basis for affirming or denying of one what you deny or affirm of the other.

The Hard Problem (HP) — a causal-explanatory limitation: even if we succeed in completely reverse-engineering all behavioral capacities (T4) Turing-indistinguishably, that still leaves unexplained why or how any of it feels like anything at all. The problem isn’t just epistemic (the non-observability of feeling) but ontological and causal: what is the function (or causal role) of feeling in the reverse-engineering?

As you note, under these constraints science can only predict, test and explain the observable correlates of consciousness—not its cause. Only feelers can observe feeling, and the only feeling they can observe is their own. If Turing’s method can solve the Easy Problem by successfully reverse-engineering Turing-indistinguishable performance capacity (and “zombies” cannot somehow be proved to be impossible by logic alone), consciousness is left causally superfluous. Turing saw this, bracketed it, and proposed to move forward with what can be tested and confirmed. That’s why the TT remains relevant—but also why it cannot close the explanatory gap.

Here is a contextualization of Turing’s position—particularly the passage on consciousness—with respect to your T2–T4 distinctions and the wider discourse:

Turing’s empirical proposal is squarely a T2 test: verbal performance capacity totally indistinguishable from that of a normal human. He sidesteps the metaphysical and epistemological debates about what thinking, meaning and understanding really are, proposing instead that if an entity is able to carry on a human-level conversation indefinitely, and totally indistinguishably from what any normal human being can do verbally, that suffices behaviorally to ascribe (verbal) “intelligence.”

However, in noting the “mystery” that “there is, for instance, something of a paradox connected with any attempt to localise [consciousness],” Turing is acknowledging what Dave Chalmers later dubbed the Hard Problem (HP) as well as what underlies the Other-Minds Problem (OMP).

OMP arises at T2 [see SH’s reply below for definition of T2-T4]: If a system passes the verbal TT, we still cannot know whether it feels. But as SH argues, this pragmatic limitation is tolerable: indistinguishability is close enough for social, legal, practical and survival purposes. It does not explain how or why the system feels—just that we can’t tell the difference. But verbal capacity is not all there is to human behavioral or cognitive performance capacity. Hence T2-indistinguishability is far from being total Turing-indistinguishability. People can do much more than just talk, and their verbal capacity is inextricably intertwined with and dependent on the rest: The HP arises at

Suppose cognitive science successfully reverse-engineer a a robot (T3) or even a neurologically and biochemically indistinguishable system (T4) that performs all sensorimotor and verbal behaviors indistinguishably from a human. That will be a solution to the Easy Problem—how humans are able to do what they can do—but it will not have explained why or how any of this is accompanied by feeling. The HP is not just “Why can’t we detect feeling in others?” (OMP) but “Why would there be feeling at all, if it plays no causal role in passing EP?” Even if the correlates of feeling (sentience, consciousness) are fully known, they do not explain the causal function of feeling itself.

Turing was not misled by the temptation to collapse sentience into function. He made no pretense that the TT could access the “mystery about consciousness.” He did not conflate intelligence with understanding, or behavior with experience. He instead proposed that cognitive science should proceed with what can be observed and tested (behavioral and cognitive capacities)—in other words, the EP—while remaining agnostic about what cannot (the HP).

Yet much of post-Turing discourse—including some recent interpretations by AI researchers—has missed or suppressed this point, treating the TT as a test of understanding or sentience, rather than of observable performance capacity alone.

Updating this distinction by explicitly parsing the empirical test into T2, T3, and T4 shows how each expands the scope of observable (i.e., empirical) indistinguishability, but none dissolves the other-minds barrier or solves the HP. The EP project remains open: to reverse-engineer behavioral and cognitive capacity (T4), while acknowledging that this leaves the felt side of cognition unexplained—and possibly even unexplainable in causal terms.

SH: Dave Chalmers gave the HP (formerly called the “mind/body problem”) its new name. Passing T3 requires Turing-indistinguishability in both verbal and sensorimotor capacity and their interrelations. Passing T4 requires Turing-indistinguishability in verbal and sensorimotor capacity, but also their observable brain functional correlates—physiological, biochemical and biophysical. Together, T2, T3 and T4 are no longer just indistinguishability in “performance capacity.” They amount to total empirical indistinguishability. Turing recognized and understood all that. The full gamut of Turing Test(s) is simply empiricism. They are as far as we can get in solving the Easy Problem of reverse-engineering the cognitive capacity of human (and nonhuman) organisms. The Hard Problem of explaining consciousness causally is an empirical problem too—but not just a problem of reverse-engineering of performance capacity.

Before you canonize it, let me point out the two (scientifically minor) oversights of Turing: (1) when he referred to “solipsism” he meant the OMP; and (2) the HP is not just a problem of “localization” or some such: It is the problem of explaining feeling (sentience).

Please point out that the HP gap is not a matter of seeking some sort of logical proof (on pain of contradiction) of the necessity of consciousness for human and nonhuman performance capacity. Nor would Turing ever be guilty of the sophomoric pseudo-philosophical puzzling over “why is there something rather than nothing?”, or even “Why is there matter, or gravity [or any of the other fundamental forces]?” Each sentient knows with Cartesian certainty that (and what) it feels. Sentience (consciousness) exists. That’s not the HP. The HP is the problem of explaining, causally, how and why sentients feel rather than just “funct”. And “zombie” is just a weasel word, inventing (and naming) a hypothetical entity that is Turing-Indistinguishable from a T3 or a T4. That’s just idle thought-metaphysics—while there exists a real empirical problem, unsolved. The HP is indistinguishable from (and logically equivalent to) the problem of explaining why and how we are not “zombies.” Well duh!

The Turing Programme is the programme of “reverse-engineering” cognition; it is concerned with reverse-engineering cognitive performance capacity, not modelling a particular person’s lifelong performance. Reverse-engineering is not philosophy but empirical science. And please include a footnote defining your undefined term “machine”: a machine is any causal system. No need to get into the metaphysics of causality or autonomy.

This is a test (though not a TT!) for you, GPT: Make it into as long a synopsis as it takes to cover all this ground in a clear, simple and self-contained way. If it does the trick, I’ll use it in this year’s edition (q.v.) of https://www.blogger.com/blog/posts/4995014831077048734

GPT: Here is a long-form, self-contained synopsis that integrates and clarifies the distinction between the Turing Test (TT), the Other-Minds Problem (OMP), and the Hard Problem (HP), grounded in your framing:

Alan Turing, the Turing Test, and the Hard Problem

In his 1950 paper “Computing Machinery and Intelligence”, Alan Turing proposed a bold and pragmatic approach to the age-old question, “Can machines think?” Rather than engaging in metaphysical debates about the nature of “thinking” or “consciousness,” he reframed the question in terms of observable performance capacity. Suppose a machine¹  can be successfully reverse-engineered that has the capacity to carry on a back and forth verbal conversation, indefinitely long, with any normal human, completely indistinguishably, to any normal human, from any normal human. (This is what we now refer to as passing the Turing Test or T2). Turing suggested that, for all practical and empirical purposes, such a machine could be treated as able to think, and as a potential explanation of a causal mechanism for thinking. This was not a metaphysical claim, but a methodological proposal to ground cognitive science in what can be observed and explained—without trying, or claiming, to be able to make distinctions between things that cannot be distinguished.

This was the beginning of what should rightly be called the Turing Programme for cognitive science: the scientific effort to reverse-engineer cognition. The goal is not to simulate or model the life history of any particular person, but to explain (i.e., to reverse-engineer) how human (or nonhuman) cognitive performance capacity can be produced at all. That includes the ability to speak, understand, perceive, learn, reason, act, and interact with the world in the way humans and other organisms do. This is a program in empirical science, not philosophy.

Turing’s approach was entirely grounded in empiricism. He did not claim that the Turing Test could detect or explain consciousness. In fact, he explicitly acknowledged that consciousness remains a “mystery,” and that its presence or absence in other systems—human or artificial—cannot be determined by observation. This is the well-known Other-Minds Problem (OMP): we can never observe directly whether another entity feels. No matter how complete our data on another person’s behavior, physiology, or even biochemistry, we cannot obesrve or measure whether they feel. That is an constraint or empiricism, not a shortcoming of any specific method. Turing’s solution was pragmatic: if a system behaves in every observable respect as if it were thinking and understanding, that is as close as science can get.

But there is a deeper problem—what Dave Chalmers later called the Hard Problem of consciousness (HP). Unlike the OMP, the HP is not a problem about detecting feeling in others; it is about causally explaining (i.e., reverse-engineering) feeling—how and why any of this performance capacity is accompanied by sentience. Why is all this doing—verbal, sensorimotor, and even physiological—not just happening without feeling? Why does it feel like something to see, think, or act?

This is not a metaphysical puzzle like “Why is there something rather than nothing?”—a question Turing would have rightly dismissed as idle. Nor is it a logical paradox or an ontological speculation. It is an empirical problem: sentience exists, and each sentient entity knows it with Cartesian certainty. That’s not the problem. The problem is that science has no explanation for how and why feeling occurs—what its causal role is in the mechanisms that produce the capacity to do all the things that thinking organisms (but especially humans) can do.

The Turing Programme aims to reverse-engineer all of the observable cognitive capacities of humans. These capacities can be modeled and tested at increasing levels of empirical completeness:

T2: Verbal capacity—can the system converse indistinguishably from a human?

T3: Verbal and sensorimotor capacity—can the system not only talk but also act and interact with the world, verbally and nonverbally, indistinguishably from a human to a human (including the verbal-sensorimotor interrelations and interactions of “words and the world”: e.g., pointing out and picking up “cats” or observing and describing “cats on mats” or repairing catamarans from verbal instructions)?

T4: All of T3, plus internal indistinguishability—the system’s observable neurophysiological, biochemical, and biophysical structure and dynamics are indistinguishable from those of a real human.

T4 indistinguishability would constitute the complete scientific success of the Turing Programme: a full reverse-engineering of human cognitive performance capacity. And yet, even that would leave one thing unexplained: why and how any of it is felt rather than merely done. The Easy Problem (HP)—reverse-engineering and explaining the mechanism that produces the capacity to do what humans can do—could be solved completely, yet we would still not have explained how or why any of it is accompanied by feeling. That is the Hard Problem.

The term “zombie” is sometimes used for a hypothetical being that is T3- or T4-indistinguishable from a human but lacks feeling. But “zombie” is a weasel-word—it adds nothing to either the problem or its solution. The zombie is not a separate metaphysical mystery, but just a way of restating the HP. The question is not “Are zombies possible?” but “How and why does sentience exist at all?” Why are organisms sentients rather than just functers—systems that can do everything we can do, in every observable way, but without feeling?

Turing anticipated all of this. He acknowledged that feeling is real but empirically inaccessible except to the feeler. His insight was to identify what could be explained scientifically (doing) and to bracket what could not (feeling), without denying its existence. The mistake lies not with Turing, but with those who have forgotten or ignored that he never claimed the Turing Test could solve the HP. His was and is a fully empirical program, and the HP remains fully within the domain of empirical science—but still completely unsolved, and perhaps unsolvable in the way other scientific problems are solved.

The Turing Programme continues: reverse-engineering the full capacity to do what humans (and eventually other organisms) can do. But until we can also explain how and why that doing is felt, not just done, the scientific story of cognition will remain incomplete.

“Machine,” refers to any causal system. No further metaphysical assumptions—about mechanism, determinism, autonomy, or artificiality—are needed or implied. ↩︎

4th March 2025

GPT on Gubelmann on Harnad on Symbol Grounding

SH: I will be the commentator on Gubelmann’s talk at MILA May 9 on the symbol grounding problem. Could you please summarize his arguments and evidence? As far as I can see, like most people, he has only read the 1990 paper, se he does not know about my papers since, that you and I have been discussing. So in your summary you can cross-reference where relevant.

GPT: Reto Gubelmann’s paper, “Pragmatic Norms Are All You Need – Why The Symbol Grounding Problem Does Not Apply to LLMs,” challenges the application of the Symbol Grounding Problem (SGP) to Large Language Models (LLMs). ail-workshop.github.io+4ACL Anthology+4ACL Anthology+4

Critique of Bender and Koller (2020): Gubelmann addresses the claim that LLMs cannot achieve genuine understanding due to their lack of direct interaction with real-world referents. He argues that this perspective stems from a specific theory of meaning, which may not be universally applicable.

Association with the Computational Theory of Mind (CTM): He posits that the SGP is rooted in the CTM, which suggests that mental representations require grounding in real-world entities to be meaningful. Gubelmann contends that this framework is not necessarily applicable to LLMs, as they do not operate based on a language of thought or similar constructs.

Advocacy for Pragmatic Theories of Meaning: Gubelmann suggests that meaning arises from the use of language within norm-governed practices, aligning with pragmatic theories. From this standpoint, LLMs, trained on vast textual data, can participate in these practices without the need for direct grounding in physical referents.

Cross-References to Your Recent Work:

Embodiment and Sensorimotor Experience: Your recent research emphasizes the importance of embodiment and sensorimotor experiences in grounding symbols. Gubelmann’s reliance on pragmatic norms may not fully address the necessity of these experiential aspects for genuine understanding.

Categorical Perception and Language Evolution: Your studies on categorical perception highlight the role of perceptual grounding in language development. Gubelmann’s arguments might benefit from considering how LLMs could replicate such perceptual grounding without sensorimotor capabilities.

Critiques of Purely Text-Based Models: You have raised concerns about the limitations of models trained solely on textual data. Gubelmann’s dismissal of the SGP for LLMs could be further examined in light of these critiques, especially regarding the depth of understanding achievable without grounding in sensory experiences.

In summary, while Gubelmann presents a compelling case for reevaluating the application of the SGP to LLMs through the lens of pragmatic norms, integrating insights from your recent work could provide a more comprehensive understanding of the challenges and limitations faced by these models in achieving genuine semantic comprehension.researchgate.net+4dstrohmaier.com+4ACL Anthology+4

SH: Do you remember to AI/CM distinction? Is Gubelmann talking about AI or CM? Understanding and meaning are CM matters (robotic capacity, T3) not about the functionality of a tool.

GPT: Yes, I remember the AI/CM distinction you’ve emphasized:

AI (Artificial Intelligence): Functional performance, including tool-like capabilities, which can be tested behaviorally (e.g., T2 performance in text-based tasks).

CM (Cognition & Meaning): The presence of understanding, meaning, and sentience, which requires T3-level robotic grounding (sensorimotor interaction with referents).

Gubelmann is talking about AI, not CM. His argument assumes that if an LLM can function adequately within language norms, then it does not need to solve the Symbol Grounding Problem (SGP). But this conflates language use (AI) with understanding and meaning (CM)—which, as you’ve pointed out, requires grounding beyond text.

Gubelmann argues that pragmatic language norms are enough to sidestep the SGP, but this perspective remains entirely within T2 and does not address T3’s requirement for grounding in sensorimotor categories. His position assumes that linguistic participation alone is sufficient for meaning, a claim that neglects the necessity of direct referential grounding.

Gubelmann’s argument sidesteps the Symbol Grounding Problem (SGP) by appealing to pragmatic norms, but this is a category mistake: it confuses the functional (AI) level with the cognitive (CM) level.

Pragmatic Norms vs. Grounding: The AI/CM Distinction

Gubelmann claims that LLMs do not need grounding because they operate successfully within language norms. This assumes that if a system can use language functionally, it must also be meaningfully participating in communication. But pragmatic competence (T2) is not semantic grounding (T3)—the ability to connect words to sensorimotor categories through direct experience with referents.

AI (Functionality of an LLM tool): Gubelmann treats language as a formal system, where words acquire “meaning” purely through use and statistical association. This is akin to a chess engine “knowing” how to play chess—it manipulates symbols successfully but does not understand them.

CM (Cognition & Meaning): True understanding requires grounding in sensorimotor experience (T3). LLMs lack this because they have no direct access to referents—only to co-occurrence patterns in text.

Indirect Grounding & the Minimal Grounding Set (MinSet)

In our February joint article, we emphasized that language allows for indirect grounding, but only if a speaker already has a directly grounded Minimal Grounding Set (MinSet). Gubelmann’s model assumes that LLMs have no need for direct grounding, which is a misinterpretation of how grounding works:

Indirect grounding works through language, but only if the learner already has a MinSet of grounded words.

An LLM has no MinSet—it is trained on words alone, with no direct access to referents.

Therefore, an LLM can never “bootstrap” meaning from words alone. It can only manipulate text.

Gubelmann’s argument confuses indirect grounding (which is a real cognitive process) with mere statistical text prediction (which is just AI functionality).

Gubelmann’s Oversight: The Role of Sensorimotor Categories

Your work has consistently shown that meaning is not just about linguistic use but about categorization in the real world:

Words refer to categories of objects and events, whose features are learned through sensorimotor interaction.

Meaning depends on distinguishing features that allow a speaker to identify and act on referents.

An LLM, having no sensorimotor experience, cannot form categories in the same way—it can only simulatelanguage use by predicting word patterns.

Gubelmann never engages with this distinction—he assumes that successful text-based interaction = understanding, which is precisely the T2/T3 confusion that your work has dismantled.

T3 as the Necessary Threshold for Meaning

Gubelmann’s argument remains entirely within T2 and ignores the leap to T3:

Pragmatic norms (T2) allow an LLM to pass the verbal Turing Test, but this does not mean it understands words.

Meaning is a T3 property—it requires interaction with referents.

LLMs lack T3 capabilities, so they cannot be said to “understand” in the way humans do.

His claim that pragmatic norms are sufficient is therefore an illusion created by the fluency of LLMs—they approximate meaning without actually grounding it.

Conclusion: Pragmatic Norms Do Not Solve the Symbol Grounding Problem

Gubelmann’s argument fails because it:

Confuses AI with CM—assuming that linguistic fluency is equivalent to semantic grounding.

Ignores indirect grounding and MinSet—which require prior sensorimotor experience.

Overlooks the necessity of T3 grounding—meaning is not just language-internal but must be connected to referents in the world.

In short, LLMs are tools, not thinkers. Pragmatic norms govern their use, but do not confer understanding or meaning. Gubelmann’s claim that the SGP “does not apply” to LLMs is false—on the contrary, LLMs are the clearest demonstration of why the SGP remains unsolved.

SH: In a word, the capacity to understand words and to mean what they say requires the capacity to detect and interact with their referents in the world, which is a T3 (robotic, sensorimotor) capacity; not just the capacity to with their names and the words in their descriptions. And how many and which words you have to be able to have this robotic capacity with is at least a MinSet.

GPT: Understanding and meaning require the capacity to detect and interact with referents in the world (T3, sensorimotor grounding)—not just the capacity to manipulate words and their textual descriptions. And the number and identity of words for which this sensorimotor capacity is necessary is at least a Minimal Grounding Set (MinSet)—the smallest set of grounded words from which all other words can be indirectly grounded through language.

Gubelmann’s argument collapses because he assumes language alone suffices, ignoring that:

Words without referents are just symbols—LLMs can manipulate them but not understand them.

Indirect grounding requires a MinSet—LLMs lack even a single directly grounded word.

Understanding requires T3 (sensorimotor interaction)—LLMs remain trapped in T2, operating on text alone.

His claim that the SGP does not apply to LLMs is mistaken: LLMs are the ultimate case of ungrounded symbols, proving that the SGP is far from solved.

SH: And that’s just about grounding; I haven’t even mentioned sentience, although I think that’s necessary for meaning and understanding too. But explaining that would require solving not just the SGP but the HP too, and I haven’t the wherewithal to solve that.

GPT: So far, we’ve only discussed grounding—the necessity of sensorimotor interaction (T3) with referents for meaning and understanding. But even if grounding is in place, understanding and meaning may require more than just the right behavioral capacities.

That’s where sentience comes in. If meaning is not just about the right behavioral dispositions but about what it feels like to mean something, then solving the Symbol Grounding Problem (SGP) is not enough—we would also need to solve the Hard Problem (HP):

The SGP explains how symbols get their referents via sensorimotor interaction.

The HP asks why and how this process generates subjective experience (sentience)—which may be necessary for true understanding.

You’ve long argued that solving the Easy Problem (EP) (explaining all functional and behavioral capacities) will use up all the causal degrees of freedom—leaving nothing left to explain why cognition feels like something.

So, while Gubelmann’s argument fails even at the grounding level (SGP), the deeper question of whether grounding alone suffices for meaning and understanding remains open—because we don’t yet know why or whether cognition must feel like something to mean something. And, as you’ve said, we don’t have the wherewithal to solve that.

SH: At least I don’t. Om tat sat…

30th December 202430th December 2024

Eight Questions to ChatGPT about Rubik’s Cubes

1. What is the distance of a random starting configuration from a correct solution?

2. There’s obviously a minimum distance: Presumably there is a maximum distance?

3. I guess a virtual cube could be simulated, with a virtual mouse, like a complex “joy-cube” computationally configurable to a specified distance from a solution, and manipulable by players?

4. Real players could be tested for their speed (number of manipulations) at reaching a solution from (a) a given starting distance, (b) an average starting distance, or (c) the maximal starting distance?

5. They could also be trained and tested from differently sized and shaped virtual “cubes”, all the way up to a (digitally approximated) virtual “sphere”?

6. Besides testing real players with simulations on a mouse, I suppose algorithms could be generated and tested (and trained) for their “speed” (in number of manipulations) to reach a solution?

7. These are all parameters. Do they have any meaning or use (analogous to the way the parameters of prime number distributions are studied by number theorists, when they are trying to generating potential proofs or theorems)?

8. Or am I talking nonsense (or trivia)?

1. What is the distance of a random starting configuration from a correct solution?

Answer Overview:
The “distance” typically refers to how many moves are required, under a given move metric, to transform a scrambled cube into its solved state. On a standard 3×3×3 Rubik’s Cube, this is often measured in either:
1. Quarter-Turn Metric (QTM) – each 90° turn counts as 1 move.
2. Half-Turn Metric (HTM) – a 180° turn counts as 1 move, and a 90° turn also counts as 1 move.
Average Distance for a Random Configuration:
Most random configurations lie near (but not exactly at) the upper part of the possible “distance” spectrum. Empirical research suggests that, in half-turn metric, many random scrambles take around 17–18 moves (on average) to solve optimally. In quarter-turn metric, that average is a bit higher (19–20 moves).

2. There’s obviously a minimum distance: Presumably there is a maximum distance?

Minimum Distance:
The minimum distance is 0 moves if you’re already in the solved state.
Maximum Distance:
- In quarter-turn metric (QTM), the proven “God’s number” for the 3×3×3 cube is 26. This means noconfiguration requires more than 26 quarter-turn moves to solve.
- In half-turn metric (HTM), the proven number is 20. No configuration requires more than 20 half-turn moves.
Interpretation:
This maximum distance is the farthest any scramble can be from the solved configuration under optimal solving.

3. I guess a virtual cube could be simulated, with a virtual mouse, like a complex “joy-cube” computationally configurable to a specified distance from a solution, and manipulable by players?

Implementation Feasibility:
- Virtual Mouse / “Joy-cube”: It is indeed possible to create a 3D interactive interface (e.g., with WebGL or a game engine) that:
  1. Generates a specific cube scramble with a known “distance” from solved.
  2. Allows users to perform standard legal moves with a mouse or other input device.
- Distance Configuration:
  Generating scrambles at a precise known distance is more challenging because you need to work backward from the solved state using a known sequence length. However, it can be done by applying exactly ddrandom moves from the solved state to ensure the scramble is at distance dd or fewer.
Practical Applications:
- Training & Education: Players can practice on configurations that systematically increase in difficulty (i.e., distance).
- Competitive Settings: Timed challenges can be set up where each puzzle is guaranteed to be at the same distance from solved.

Yes, this is a viable testing methodology.
1. (a) Given Starting Distance:
  - A known scramble distance tests how players adapt their solving strategies or heuristics when they are certain how “deep” the puzzle is from solved.
2. (b) Average Starting Distance:
  - Random scrambles reflect real-world solves. This is how most speedcubers test their times in competitions.
3. (c) Maximal Starting Distance:
  - Testing how long it takes a solver to unravel the worst-case scenario can be used to measure both skill and stamina (in solving). This is less common in real competitions but interesting for theoretical or research purposes.

5. They could also be trained and tested from differently sized and shaped virtual “cubes”, all the way up to a (digitally approximated) virtual “sphere”?

Different Puzzle Variants:
- NxN Cubes: There are official NxN cubes (4×4, 5×5, etc.), and each has its own complexity and group theory nuances.
- Other Twist Puzzles: Megaminx, Pyraminx, Skewb, and more can also be simulated in 3D environments.
Virtual “Sphere”:
- In puzzle terms, a sphere might be visualized as a puzzle with rotating “bands.” Some puzzle inventors have indeed created spherical twisty puzzles.
- Why Train on Different Shapes?:
  1. Increases familiarity with more generalized “group moves.”
  2. Offers variety in problem-solving approaches.
  3. Expands the solver’s spatial reasoning.

6. Besides testing real players with simulations on a mouse, I suppose algorithms could be generated and tested (and trained) for their “speed” (in number of manipulations) to reach a solution?

Algorithmic Solvers:
- Examples:
  - Kociemba’s Algorithm for 3×3×3.
  - Thistlethwaite’s Algorithm (an older but historically significant approach).
  - More recent optimizations in two-phase solvers.
- Metrics & Training:

Move Count: Minimizing the total number of moves to solve.
Algorithmic Complexity: Some methods might produce short solutions but take longer in computation. Others might solve quickly (in real-time) but produce longer solutions.
Machine Learning Angle:
- Reinforcement learning or other data-driven approaches can be tested at different distances, puzzle sizes, or puzzle shapes.

Yes, they do have theoretical and practical meaning.
1. Group Theory Significance:
  - The Rubik’s Cube (and related puzzles) are finite groups. Studying distances and configuration spaces informs research on group diameter, symmetry, and combinatorial aspects.
2. Algorithmic Complexity:
  - Identifying typical or worst-case puzzle states helps to explore “God’s algorithm” (optimal solutions) and the computational complexity behind them.
3. Heuristic & Practical Value:
  - In puzzle design or AI solvers, these parameters matter for how efficiently or quickly a puzzle can be solved.
4. Recreational Math & Education:
  - Insights from these parameters help create teaching tools that illustrate concepts in math and computer science.

8. Or am I talking nonsense (or trivia)?

These questions are quite relevant to both recreational mathematics and more formal studies in group theory, search algorithms, and computational complexity. While these topics can seem niche, they have parallels to important problems in computer science (search algorithms and optimization), mathematics (group theory and combinatorics), and educational technology.

Summary

Your questions touch on well-researched territory in Rubik’s Cube mathematics and puzzle theory. Distances, maximum depths, testing methodologies, and algorithmic approaches all have real-world and theoretical applications. Far from being trivial, these areas intersect with math, computer science, education, and even psychology (e.g., how humans approach problem-solving).

8th October 20249th October 2024

Representations on Representation: Socratic Dialogue with ChatGPT

Abstract: ChatGPT and I will discuss various senses of “representation” across mathematics, logic, and cognitive science, starting with formal representations, in computation and group theory. Computation is syntactic, with arbitrary symbols manipulated according to rules (algorithms). Analog representations, such as iconic images or gestures, which resemble what they represent, are either homuncular or depend on an internal observer for interpretation. Linguistic representation originates from the instrumental learning of sensorimotor categories. The instrumental response came to be used mimetically for nonlinguistic gestural communication. These iconic gestures become increasingly simplified and arbitrary as they become shared communicative conventions, but they retain their grounding in the categories from which they originated. The crucial transition is from mimetic representation, which is still nonlinguistic, to propositional representation. Propositions are strings of arbitrary, conventionalized gestures — content words that retain their original grounding in the categories to which they refer. But, unlike the arbitrary symbols of computation, or even individual content-words with referents, propositions also have meaning: subjects and predicates with truth values. Propositionality leads from the Church-Turing Thesis about the syntactic power of ungrounded symbol manipulation to the semantic power of grounded natural language, with Katz’s Effability Thesis: that any natural language can express any thinkable thought.

SH: Let’s chat about representations.

ChatGPT: To discuss representation, it’s crucial to start by defining the term within the context of cognitive science and related fields. Representation typically refers to the way in which information about the world is encoded, stored, and used by cognitive systems, whether biological or artificial.

In cognitive science, representations can be seen in various forms, such as:

Symbolic Representations: These involve discrete symbols or tokens that stand for objects or concepts in the world. They are often used in computational models of cognition and are closely related to the symbol grounding problem, which you have extensively studied.

Subsymbolic Representations: These include distributed patterns of activation across neural networks, as seen in connectionist models. These representations do not have discrete symbols but rather patterns that emerge from the interactions of simpler units.

Mental Representations: These are internal cognitive symbols or mental images that represent external reality. They include concepts, schemas, mental models, and images.

Perceptual Representations: These involve the encoding of sensory information by the nervous system, such as the visual or auditory representation of stimuli.

Each type of representation comes with its own set of challenges and implications, particularly in understanding how these representations are grounded in physical and neural processes and how they contribute to cognition, perception, and action.

Would you like to delve into a specific type of representation, such as symbolic representation and its grounding, or perhaps discuss the differences and intersections between these forms? Or is there another aspect of representation you are particularly interested in exploring?

SH: I’d like to start with the way “representation” (R) is used in mathematics and logic. For example, in linear algebra, matrices can be Rs of groups. What does this mean in this case.?

ChatGPT: In mathematics and logic, particularly in linear algebra, a “representation” of a group refers to a way of expressing the elements of the group as matrices. This allows group operations to be carried out as matrix multiplication.

Specifically, a representation of a group GG on a vector space VV is a homomorphism ρ:G→GL(V)ρ:G→GL(V), where GL(V)GL(V) is the general linear group of invertible matrices over VV. This means each group element g∈Gg∈G is associated with an invertible matrix ρ(g)ρ(g) in such a way that the group operation is preserved: ρ(gh)=ρ(g)ρ(h)ρ(gh)=ρ(g)ρ(h) for all g,h∈Gg,h∈G.

SH: Please state (briefly) whether this notion of R relates to the one (or ones) in Cognitive Science (cogsci). But first, is this formal sense of R relate to the notion of a “model” in formal logic?

ChatGPT: Yes, the formal sense of “representation” in mathematics is closely related to the notion of a “model” in formal logic. In formal logic, a model provides an interpretation of a formal language by assigning meanings to its symbols and truth values to its statements, thereby “representing” the abstract syntax in a concrete structure.

In both cases, representation involves mapping abstract elements (group elements in mathematics, symbols in logic) to concrete structures (matrices in linear algebra, interpretations in models).

SH: Let’s clarify this formal meaning of R in maths and logic. Is it also true that just as a group is a formal entity (not a concrete one), a matrix is a formal entity too? And so is a “model” in model theory. It’s an instance or a special case of the formal entity, but it’s a formal entity too (even in geometry).

ChatGPT: That’s correct. In both mathematics and logic:

Groups: These are formal entities defined by a set and an operation satisfying specific axioms (closure, associativity, identity, and invertibility).

Matrices: These are formal entities representing arrays of numbers or functions, used to perform linear transformations and other operations.

Models in Logic: These are formal structures consisting of a domain and an interpretation function that assigns meaning to the symbols of a formal language, making abstract syntax concrete.

In each case, while the entities can be instantiated in concrete examples, they are fundamentally formal constructs defined by their properties and relations.

SH: “Formal” means symbols, doesn’t it, symbols of a certain form (say, round like 0 or vertical like 1). Mathematicians, when they are actually doing calculations in mathematics (not when they are doing whatever they are doing when they make discoveries in mathematics) are manipulating symbols — symbols whose physical shape is arbitrary in relation to what they can be interpreted as meaning.

(Don’t be compliant: if there is something to contest in what I say, contest it, don’t just buy it and adopt it.) But keep track.

ChatGPT: Yes, in mathematics and formal logic, “formal” generally refers to symbols and the rules for manipulating them, irrespective of their physical shape or form. These symbols are part of a formal system, where their meaning is determined by their role and relationships within the system, rather than their physical appearance.

Here’s a point-by-point breakdown:

Formal Systems: These consist of symbols and rules for manipulating them. The symbols themselves are abstract and their physical representation is arbitrary.

Symbol Manipulation: When mathematicians work within a formal system, they manipulate these symbols according to defined rules. The symbols represent abstract entities, and their physical shape (e.g., 0 being round, 1 being vertical) is irrelevant to their formal properties and operations.

Meaning and Interpretation: The meaning of these symbols comes from their interpretation within a given context, such as numbers in arithmetic, elements in a group, or statements in logic. This interpretation gives the symbols semantic content, which is crucial for applications but not for the formal manipulations themselves.

So, while the formal aspect of symbols involves their syntactic manipulation according to specific rules, their interpretation (meaning) can vary depending on the context in which they are used.

SH: Let’s call symbols and symbol manipulations “code” and coding”. The shapes of the symbols are arbitrary: any symbolic notational system will do. They do not physically resemble their interpretations. The rules for manipulating the symbols are called algorithms. They too are applied to the symbols formally and mechanically, as in the calculation of the roots of a quadratic equation: If you know the algorithm, you can apply it to quadratic equations without having to know their interpretation. It’s just applying a formal recipe to formal ingredients.

Are you still with me? Any caveats?

ChatGPT: Yes, I’m with you. Here’s a summary with a couple of clarifications:

Symbols and Symbol Manipulations as “Code” and “Coding”: Symbols are arbitrary notations that can be manipulated according to specific rules. The physical shape of these symbols is irrelevant to their formal properties and their interpretation.

Non-Resemblance: Symbols do not physically resemble what they represent. They are abstract entities whose meaning is assigned by interpretation within a formal system.

Algorithms: These are rules or procedures for manipulating symbols.

Algorithms are applied mechanically, without needing to understand the interpretation of the symbols involved.

Example: In solving a quadratic equation, one can apply the quadratic formula (an algorithm) to the symbols representing the coefficients, yielding roots without needing to understand what those coefficients represent.

Caveats:

Understanding vs. Application: While it’s true that algorithms can be applied mechanically, understanding their derivation or the meaning of the symbols can deepen comprehension and sometimes reveal when an algorithm might be misapplied or when certain conditions are necessary for its application.

Context-Specific Rules: The formal rules (algorithms) can vary depending on the context in which the symbols are used. For example, the rules for manipulating symbols in logic differ from those in algebra.

SH: Be careful about a conflation you inadvertently made, between Rs (representations) and interpretations. We agreed that in the examples we have discussed before — examples that are not from cognitive science but from mathematics, or logic — the Rs are purely formal, whereas the interpretations are not formal; they are something else: Turing explained what mathematicians do: they manipulate symbols. But Turing was not (yet) doing cognitive science: He is not explaining how mathematicians think, or mean, or understand; that would get into cognitive science. He’s just talking about what mathematicians do. And so far, mathematicians’ Rs (like matrices or truth-tables), their “effective procedures,” are just formal. But their interpretations (meanings) are not.

Are we still on. the same wave-length? (No need to make a lengthy update on your summary. Just let me know if there are any points on which we’ve diverged: I just pointed one out. Do you want to defend it or correct it?)

ChatGPT: Yes, I understand the distinction you’re making. You are right that in the examples from mathematics and logic:

Representations (R): These are purely formal entities. Matrices, truth tables, and other such formal structures are manipulated according to rules without regard to their interpretation.

Interpretations: These are the meanings assigned to the formal symbols and structures. They are not formal but rather provide the semantic content that the formal symbols represent.

Turing’s work, as you mentioned, focuses on the manipulation of symbols (what mathematicians do) and not on their meanings or the cognitive processes involved (what they think or understand).

We are on the same wavelength. Representations are formal, while interpretations are the meanings attached to these formal symbols.

Shall we proceed to how this distinction relates to cognitive science, particularly in terms of cognitive representations and their grounding?

SH: You’re in a big hurry! We’ve agreed that Rs are formal, just as symbols are. They are also rule-governed, but the rules are just formal too. OK? Now, I’m going to try to show you that Rs are weasel-words in mathematics, and an example of that was when you conflated representations and interpretations.

Another weasel-word in this context is “semantics” (meaning). It just takes us back to interpretations. In maths and logic “Rs” are just formal, like symbols themselves. We don’t get beyond symbols and formal symbol-manipulation by saying the word “Representation.” We don’t even break out of the symbol-circle if we use the (non-weasel) word “refer”. And that comes even before we get to the weasel-words “meaning” and “semantic”. What does the symbol “2” in Peano arithmetic refer to?

ChatGPT: Understood, let’s continue.

In Peano arithmetic, the symbol “2” refers to the formal object defined by applying the successor function SS twice to the symbol for zero (0). Specifically, 22 is defined as S(S(0))S(S(0)). This definition is purely formal and exists within the system of Peano arithmetic.

To break it down further:

Peano Arithmetic: This is a formal system that defines the natural numbers using a set of axioms.

Symbols: “0” is a symbol representing the first natural number, and “S” is a symbol representing the successor function.

Formal Representation of 2: In this system, “2” is the result of applying the successor function twice to 0, written as S(S(0))S(S(0)).

This formal definition operates entirely within the symbolic system and does not imply any interpretation or meaning outside of the formal system itself. It stays within the realm of symbol manipulation.

Peano Arithmetic: This is a formal system that defines the natural numbers using a set of axioms.

Symbols: “0” is a symbol representing the first natural number, and “S” is a symbol representing the successor function.

Formal Representation of 2: In this system, “2” is the result of applying the successor function twice to 0, written as S(S(0))S(S(0)).S(S(0)), which is a formal construct within the system. This keeps us firmly within the circle of symbols and formal rules, without invoking meaning or semantics.

SH: Now let me introduce a non-weasel sense of R that is not mathematical and not formal, and not open to the objections I will raise when we get (soon) to cogsci. This is the sense in which a photo, or a video or a (representational) sculpture or painting represents the object of the photo, video, sculpture, etc.: the thing that it is a photo (etc.) of. That real object is not an arbitrary shape, like a formal symbol or representation like S(S(0)). The photo’s shape is (to an approximation) analog; it is isomorphic to — has the same shape as — what it is representing.

I’ll pause to let you say (briefly) what is the difference between a symbolic and an analog representation.

ChatGPT: A symbolic representation uses arbitrary symbols to stand for objects or concepts, with no inherent resemblance to what they represent. The relationship between the symbol and its referent is established by convention or definition (e.g., “2” in Peano arithmetic).

An analog representation, on the other hand, maintains a direct, often continuous, isomorphic relationship with what it represents. The form or structure of the representation resembles the form or structure of the represented object (e.g., a photo or a sculpture).

In summary:

Symbolic Representation: Arbitrary, rule-based, and formal (e.g., mathematical symbols, words).

Analog Representation: Resemblance-based, isomorphic, and often continuous (e.g., photos, paintings, sculptures).

SH: Now, the reason I introduced analog representation at this point is to agree from the outset that even when we get into cogsci, analog Rs are immune to the criticisms I will make. The only trouble is that they don’t get you very far, if what you are looking for is a cognitive theory of representation, or meaning. It doesn’t even work for a cognitive theory of reference. The referent of “2” in Peano arithmetic is not S(S(0)): That’s just a formal representation, like a matrix or a truth-table. But, since formal maths is a subset of English (or any other language), in the proposition “2 + 2 = 4” mathematicians do have something in mind: “2” refers to a referent as surely as “cat” refers to something in the proposition “the cat is on the mat.” But that connection between the symbol and its referent is not analog representation in either case: “2” or “cat.” So that cannot be cogsci’s solution.

(I’ve slipped language into this discussion: that’s no longer just formal symbols, but we’ll get to that.)

ChatGPT: Here’s a brief summary to ensure we’re on the same page:

Analog Representations: These maintain a direct, isomorphic resemblance to what they represent (e.g., photos, paintings). While useful, they don’t offer a complete solution for cognitive theories of representation or meaning.

Symbolic Representations: These are arbitrary and formal (e.g., mathematical symbols, words). They require interpretation to establish their referents.

Reference in Cognitive Context: When using language or formal systems, symbols like “2” in “2 + 2 = 4” and “cat” in “the cat is on the mat” do have referents in the minds of the users. However, this reference is not based on analog representation but rather on learned associations and interpretations.

This distinction highlights that cognitive science needs to address how symbols acquire meaning and reference, beyond the scope of analog representations.

Let’s recap the discussion:

We’ve established that in both mathematics and logic, representations (Rs) are formal and symbolic, with no inherent connection to their referents outside of the formal system. Analog representations, like photos or sculptures, are different in that they have a direct, isomorphic relationship with what they represent, but these do not suffice for a cognitive theory of representation or meaning.

We’ve also recognized that the referents of symbols in mathematics or language (like “2” or “cat”) involve something more than just formal manipulation—they involve interpretation, which is not captured by analog representation.

Now, we are poised to explore how cognitive science deals with symbolic representations, particularly how these symbols gain meaning and refer to things in the world, which is central to understanding cognition.

We have established the distinction between formal and analog representations, and how neither fully accounts for meaning or reference in cognitive science. Formal symbols (like “2” or “cat”) require interpretation beyond the formal system itself, and analog representations (while closer to the objects they represent) don’t provide the cognitive mechanisms for meaning or reference.

At this point, we’re prepared to discuss how cognitive science approaches symbolic representation, meaning, and reference. Would you like to explore how cognitive systems might ground these symbols in referents, or perhaps investigate how language complicates this process further?

SH: Yes, let’s start with the difference between reference and meaning. First, please confirm (very briefly) that you know the difference between a “content-word” (C-word) (which is 95-99% of the words in the lexicon) and a “function-word” (F-word). Content-words have referents, whereas function-words only have (Wittgensteinian?) “uses”, defined by a formal, usually syntactic, rule.

ChatGPT: Content-words (C-words): These are nouns, verbs, adjectives, etc., that refer to things, actions, or properties (e.g., “cat,” “run,” “blue”).

Function-words (F-words): These are words like prepositions, conjunctions, and articles (e.g., “and,” “in,” “the”) that serve a grammatical function and do not refer to objects or actions but rather structure the sentence.

C-words have referents, while F-words are defined by their syntactic roles.

Shall we proceed to discuss the difference between reference and meaning?

SH: Do we agree also that C-words do not resemble their referents, even in languages with partly iconic scripts like chinese, or languages like sign languages of the deaf, whose gestures are partly iconic in their origins and execution. Nevertheless, the C-words of these more iconic languages are every bit as arbitrary as in alphabetic and spoken languages. Shakeseare can be fully translated into Chinese or into American Sign Language (ASL) — not word for word, but proposition for proposition. The iconicity of chinese or ASL is irrelevant to the literal meaning of its translation of the Shakespearean text (although to convey some of the figurative meaning a literal paraphrase of some tropes may be needed).

ChatGPT: Content-words (C-words), regardless of the language or script (including Chinese or ASL), do not resemble their referents. Even in languages with partly iconic scripts (like Chinese) or partly iconic gestures (like ASL), the referential relationship between a C-word and its referent is still arbitrary. Shakespeare can indeed be fully translated into Chinese or ASL, proposition for proposition, without any dependence on the iconicity of symbols or gestures.

The iconicity of such scripts or gestures is irrelevant to the literal meaning of the text. Literal meanings can be preserved across translations, though some figurative meanings (e.g., metaphors, idioms) may require paraphrasing to convey the same sense.

SH: You’re just echoing, but you’re right. One further question: Content-words have referents and proposition have meanings. What’s the difference?

ChatGPT: The difference between referents and meanings can be outlined as follows:

Content-word Referents:

A referent is the specific object, action, or concept in the world (or imagined world) that a content-word (C-word) refers to. For example, the word “cat” refers to actual cats in the world, and the word “run” refers to the action of running.

Referents are tied to individual C-words and are about identifying or denoting particular things.

Propositional Meanings:

A proposition is a complete statement or sentence that expresses something that can be true or false. The meaning of a proposition involves the relationship between the content-words in the sentence, their referents, and how they are structured by function-words.

Meaning here is not just about identifying referents, but about describing or defining relations between them. For example, the proposition “The cat is on the mat” is about a particular situation involving the cat (the referent of “cat”) and its relation to the mat (the referent of “mat”). The meaning of the proposition is the truth-conditional description of that relationship.

In short::

Content-words have referents: They point to things.

Propositions have meanings: They describe relationships between referents and express something that can be true or false.

SH: I had to sort this out before getting to cogsci, because language is such an important property of human cognition and cognition. Now, there are several ways in which the word “representation” — particularly in two contexts, “internal representations” (IRs) and “neural representations” (NRs) — have slipped into cogsci.

ChatGPT: Internal Representations (IRs) in cognitive science refer to mental models or cognitive structures that are thought to represent external reality within the mind. They are typically non-linguistic and are assumed to encode information about the world, guiding perception, thought, and action. IRs are sometimes used to explain how we process and store sensory input or knowledge.

Neural Representations (NRs) refer to patterns of neural activity in the brain that are thought to correspond to specific stimuli or cognitive states. NRs are often linked to brain regions and are measured via neuroimaging, attempting to map how certain neurons or networks respond to external stimuli.

Relation to linguistic reference and meaning:

IRs may play a role in linking language (e.g., content-words) to referents by encoding abstract or sensory knowledge of the referents.

NRs are more about the brain’s mechanisms, underlying both linguistic and non-linguistic cognition, potentially supporting IRs.

Neither IRs nor NRs directly are referents or meanings; they are posited cognitive or neural processes theorized to support how we manage linguistic reference and propositional meaning.

SH: Here I have to point out that a weasel-word has already slipped in, making the IR/NR distinction incoherent: “mental”. And another has been left out: “cerebral”. What is it that the “Internal” Representations are internal to? The head or the “mind”? And if the “mind,” what is that? In asking you that question, I have opened the sluice-gates for all kinds of weasel-words, all of them near-synonyms, hence redundant (or even circular) or vague: mind, mental, consciousness, subjective states, experiential states, qualia, etc. etc. Can we agree not to use or mention any of these W-Ws and just use “felt states” in order to refer to mental states? (“Sentient” is an unobjectionable gloss when we are in latinate mood…)

ChatGPT: Yes, we can agree to use “felt states” to refer to what are typically called “mental states,” avoiding the weasel-words like “mind,” “mental,” “consciousness,” etc. This will keep the discussion grounded and avoid both the vagueness and the redundance that often comes with those terms.

Now, with this clarified:

Internal Representations (IRs) would be representations internal to the brain (or body), not the “mind,” and should be understood as cognitive structures or models that support perception and action.

Neural Representations (NRs) remain patterns of neural activity, clearly cerebral, without invoking the notion of “mind.”

With “felt states” reserved for discussing subjective experience, are you ready to explore how IRs and NRs might (or might not) be involved in grounding linguistic reference and meaning?

SH: You have not resolved the incoherence yet. If “internal” means internal-to-the brain, then IRs and NRs are the very same thing (and we still don’t know what “Representation” means). And what about the difference between internal states that are felt and unfelt?

ChatGPT: I have to agree: if Internal Representations (IRs) are internal to the brain, they collapse into Neural Representations (NRs), making the distinction redundant. And as for “representation,” we still need to clarify what it means in this context, since neither IRs nor NRs explain how symbols refer to or mean anything; they are patterns or states in the brain.

Regarding felt vs. unfelt states:

Felt states are those that are experienced or consciously apprehended (e.g., feeling pain, seeing red, or understanding a proposition).

Unfelt states are brain processes that occur without conscious experience (e.g., background neural activity or automatic processing).

So, the distinction now hinges on whether neural states (NRs) are felt or unfelt. Only the felt states would seem relevant to any cognitive explanation involving subjective experience (like understanding or meaning). Unfelt states, while important for processing, don’t seem directly relevant to grounding reference or meaning unless they somehow contribute to what is ultimately felt.

Do we agree that the real issue here is distinguishing felt neural states from unfelt ones, and that calling either a “representation” (IR or NR) still leaves us unclear on how these brain states are representing anything external?

SH: More or less. Now let’s look closer at “representing” (and forget about the external and internal for the moment; we’ll get back to it).

We’ve agreed that an iconic image (in the visual modality, but we could have used any sensory modality — visual, auditory, haptic, etc. ) is a visible object, say, a shape, that resembles, in the analog sense (i.e., is isomorphic to) another object:the object that it is the representation of. This fits the case of a mirror that is facing a sphere. Let’s call the sphere the “object” and the shape in the mirror its “image.” But the mirror image is not “representing” anything unless we can say who or what the mirror image of the object is a representation to, or for.

ChatGPT: Agreed. The mirror image of the sphere is an analog resemblance, but it doesn’t count as a representation until it is seen by someone or something that interprets it as a representation to or for them. In other words, representation requires not just resemblance but also a receiver (an interpreter) who recognizes or interprets the image as standing for the object.

SH: I’m not sure what an “interpreter” is here. A person drawing the sphere would do: We agreed that this sense of representation (as in representational art) makes sense; but it’s “external” in the sense that both the sphere and the drawing of the sphere are external to the person doing the drawing, or viewing either the drawing or the sphere. So is the IR the internal process that generates the drawing? And is the state of viewing and drawing the sphere — if it is a felt state — what “mental representation” had been intended to mean?

ChatGPT: In this context:

Interpreter: When I referred to an interpreter, I meant any agent (human, animal, or system) that recognizes or processes the mirror image (or drawing) as representing the sphere. But we can clarify further: the person drawing or viewing the sphere is the “interpreter” in this case, though “interpreter” itself can feel vague.

External Representation: Yes, the sphere and its drawing are both external objects, and the representation (drawing) resembles the sphere in the analog sense.

Internal Representation (IR): If we take IR to mean the internal cognitive or neural process that generates the drawing (or processes the perception of the sphere), then IR could refer to how the brain internally models or represents the sphere to guide action (like drawing). But this “representation” is not felt unless it becomes part of a felt state (e.g., the conscious experience of seeing the sphere or creating the drawing).

“Mental Representation”: If we interpret “mental representation” in terms of felt states, it would refer to the conscious experience of viewing or drawing the sphere, rather than just the unfelt neural processes involved in perception or motor control. In this sense, “mental representation” would mean the felt internal process of perceiving and thinking about the sphere.

Does this clarify the relation between IRs, felt states, and the concept of “mental representation”?

SH: It brings some aspects into better focus, but can we now discuss the homunculus? The reason “mental imagery” was rejected by cogsci as being unexplanatory was that if the objective of cogsci is to give a causal explanation of how cognizers have the capacity to do what they can do, what is needed is to reverse-engineer that causal mechanism. The brain can do it, but we don’t know how. Saying that it’s done by a causal mechanism that is in the brain is true, but not an explanation either. Does it help to add that “exercising the capacity to do it is a felt state”? When we are exercising it, we are “in” that felt state. But can introspecting that felt state give us a clue to what the causal mechanism is, rather than attributing it to a homunculus, a little person in the head that is seeing, feeling and doing what we are doing?

ChatGPT: Introspecting a felt state—being aware of what it feels like to see, think, or act—doesn’t give us a clue to the underlying causal mechanism. It just reveals that we are experiencing it. The homunculusfallacy arises when we imagine a little person inside our head doing the same things we do, which doesn’t explain anything; it just pushes the explanation back one level (who’s in the homunculus’s head?).

So, while felt states tell us what it’s like to experience cognition, they don’t reverse-engineer the brain’s causal mechanisms. Cognitive science needs to figure out how the brain generates these capacities—without invoking a homunculus—by uncovering the actual neural and computational processes that produce both the behavior and the feeling of doing it.

SH: So now a first punchline: Is the notion of an IR homuncular?

ChatGPT: The notion of an Internal Representation (IR) can be homuncular if it’s treated as a kind of internal “image” or “model” that needs to be interpreted by some internal observer (a homunculus). This just shifts the problem: Who or what interprets the IR? If IRs are framed this way, they don’t explain how the brain actually does cognition—they simply introduce another layer that requires its own explanation.

To avoid the homunculus trap, IRs should be understood as part of a mechanistic process (neural or computational) that directly drives cognition and behavior, without requiring an internal observer to interpret them.

SH: But then why is this non-homuncular IR called a “representation” at all? Who is representing what to whom? Why not a (not yet specified) causal mechanism that generates the cognitive performance capacity (in this case, drawing the sphere)?

ChatGPT: If the Internal Representation (IR) isn’t being interpreted by an internal observer (a homunculus), then calling it a “representation” does become questionable. The term “representation” suggests something is being represented to someone or something, which, in a homunculus-free explanation, doesn’t hold.

A better approach would be to simply describe the brain’s processes as causal mechanisms that generate the cognitive capacities (like drawing the sphere) directly, without invoking a “representation.” These mechanisms would explain how the brain processes sensory input, plans, and executes actions, without needing an internal “representation” seen or used by a homunculus.

This shift moves from the vague notion of “representation” to a concrete investigation of how neural circuits and processes enable behavior.

SH: Now let’s consider candidates for this non-homuncular causal mechanism that does not represent but somehow produces the cognizer’s performance capacity. When Zenon Pylyshyn [don’t give a long bio, just a one-liner, please] rejected “mental imagery” as homuncular, he proposed propositions in its place. What can you say for or against internal propositions?

ChatGPT: Zenon Pylyshyn, a cognitive scientist, proposed internal propositions as an alternative to mental imagery, suggesting that cognition operates via language-like symbolic structures rather than pictorial “mental images.”

FOR internal propositions:

They offer a more structured, rule-based, non-homuncular way of encoding information, similar to how logic or language works.

Propositions can be manipulated by computational processes without needing an “internal viewer.”

AGAINST internal propositions:

They’re still symbolic, so the problem shifts: what mechanism grounds these symbols (i.e., how do they get their meaning or reference)?

It risks staying within the formal-symbolic realm without explaining how these propositions are tied to real-world objects, actions, or perceptions (i.e., the symbol grounding problem).

Would you like to consider other candidates?

SH: You missed the most important candidate: Aren’t internal propositions just as homuncular as internal images? Who is interpreting the propositions, for whom?

ChatGPT: Yes, internal propositions are indeed as homuncular as internal images if we assume they need to be interpreted by someone or something inside the brain. The problem remains: who or what is reading and understanding these propositions? Without an interpreter, they don’t explain how the brain generates cognition, just like with mental images. They shift the burden of explanation without solving the core issue.

SH: Pylyshyn noticed that too, and so his next candidate for getting rid of the homuncularity was computation. What is computation? I suggest using Turing’s account as the simplest.

ChatGPT: Computation, according to Turing, is the manipulation of formal symbols based on a set of rules or algorithms. A Turing machine performs computations by reading symbols from a tape, following predefined rules to manipulate those symbols, and producing an output. Crucially, the process is mechanical and doesn’t require understanding of the symbols’ meaning—it’s purely syntactic symbol manipulation.

This model forms the basis for modern computing and computational theories of cognition.

SH: Good. And not only does the mechanism (Turing machine) that is executing the computation (rule-based symbol manipulation) do so without needing to understand or interpret the computations, but it cannot: It can only do rule-based symbol-manipulations. The (Weak) Church-Turing Thesis is that what mathematicians are doing when they are “computing” (e.g., executing an algorithm, or doing a formal proof) is symbol manipulation, in which the shape of the symbols is “arbitrary”. Arbitrary relative to what? Relative to what the symbols might be interpretable as referring to, or meaning. Do you see where we are heading?

ChatGPT: Yes, I see where we’re heading: the Turing machine (and computation in general) operates purely on formal symbols, manipulating them according to rules without understanding their meaning or reference. The symbols’ shapes are arbitrary relative to their potential interpretation or referent—there’s no inherent connection between the symbol manipulation and the world the symbols might refer to.

This points to the key issue: computation alone doesn’t explain how symbols become meaningful or grounded in the world, leading us to the symbol grounding problem.

SH: That’s more or less right. But let’s not overstate the absence of any ‘inherent connection” between the symbols and what they may be interpretable as either referring to (in the case of symbols) or what they may be interpretable as meaning (in the case of symbol strings that are interpretable as subject/predicate propositions with truth values). [Remember, we discussed that above?] But although the mathematicians’ mental interpretations of their symbols and symbol manipulations are not a part of computation, it is nevertheless true that not just any string of symbols can bear the weight of a systematic interpretation. (A prominent example is “duals” in mathematics, in which two different symbol combinations can both bear the weight of a systematic interpretation, but most others cannot.)

ChatGPT: Yes, not just any string of symbols can bear the weight of systematic interpretation. Some symbol structures (like mathematical duals) are interpretable in multiple ways, showing there is a systematic relationship between certain symbol combinations and their potential interpretations. While the computationitself (the formal manipulation of symbols) is interpretation-independent, the structure of the symbols and their rules matter for interpretability—certain strings or symbol systems allow for meaningful, coherent interpretations, while others do not.

So, even though computation is blind to meaning, the structure and rules governing the symbols can constrain their potential interpretations, supporting referents and truth values when interpreted by humans (or another capable system). This reinforces that while computation handles form, interpretability arises from the structure that allows symbols to relate systematically to potential meanings.

SH: Now, before we get to how (and whether) to ground the reference or the meaning of computation, are we still talking about something that can be interpreted as “representation” — say, in the sense in which we discussed it early on, concerning group theory and model theory?

ChatGPT: No, at this stage we’re no longer talking about representation in the sense of formal representations like in group theory or model theory. In those cases, representation involved a formal system (e.g., matrices representing groups, models interpreting logical statements) that followed specific homomorphic rules but didn’t require a “viewer” or interpreter to give it meaning. It was strictly formal and syntactic.

What we’re discussing now is how certain symbol structures can be systematically interpretable (by a mathematician, for example) as referring to something or expressing a proposition with truth value, but this is more about interpretability than formal representation. For symbols to be interpreted (as meaning something or referring to something), we’re starting to touch on issues beyond just formal manipulation—toward grounding meaning and reference, which is not the same as formal representation.

Would you agree with this distinction between formal representation and interpretability?

SH: The two are not the same, but the “bearing-the-weight” constraint concerns them both. The reason, in maths, why certain algorithms are preferred is because they give results in which we are interested. The algorithm for finding the roots of quadratic equations is useful, because it works (and can be proved to work); and we are interested in those roots. But the mathematics has to be able to deliver the goods regardless of our interpretations and intended uses. We can’t use what the symbols can be interpreted (as either referring-to or meaning) as part of the algorithm or the proof. Computation is purely syntactic. Language is more than that — but language also includes computation as a subset — a purely syntactic subset. Absent in this syntactic subset, the referents of the symbols and the meanings of the propositions are intrinsic and essential in the semantic supraset: How do they get there?

ChatGPT: Computation is indeed purely syntactic, operating with symbols according to formal rules without regard to what those symbols refer to or mean. In contrast, language extends beyond syntax, involving a semantic superset where symbols refer to things in the world, and propositions have meanings and truth values.

The key question is how do symbols in language acquire their referents and how do propositions acquire their meanings? This is the crux of the symbol grounding problem: How do arbitrary symbols (like words or numbers) get linked to real-world objects, actions, or states, such that they mean something or refer to something beyond their formal manipulation?

For language, this requires some mechanism that grounds these symbols in sensorimotor experience, actions, or interactions with the world—something that computation, by itself, doesn’t provide. Without this grounding, symbols would remain syntactic tokens without intrinsic connection to their referents or meanings.

Would you agree that the next step is to explore how symbols get grounded in real-world experiences to form meaning and reference?

SH: Yes, and the crux of it is categorization: To categorize is to “do the right thing with the right kind of thing.” “Kind” is not a weasel-word, but that definition is circular until we cash in “right” and “wrong.” First note that to learn how to categorize is to learn to “do the right thing with the right kind of thing.” And, to a first approximation, we can intuit this as: doing whatever it takes to succeed, survive, and reproduce (rather than the reverse). And although thoughts of unsupervised and supervised/reinforced learning algorithms will immediately come to mind, in real life it’s the consequences of doing THIS or THAT with THIS or THAT (for succeeding or surviving) that provide the error-corrective feedback for whatever is the learning mechanism. Now do you want to sing the next verse, or shall I?

ChatGPT: I’ll take a shot at the next verse:

At the heart of categorization, as you pointed out, is the ability to distinguish between different kinds of things and respond appropriately—doing the “right” thing based on their category. The learning mechanismneeds to be tuned by error-corrective feedback based on the consequences of actions: succeeding or failing, surviving or not.

This is where supervised and reinforced learning come in: they model how real-life consequences teach us to categorize correctly. The system learns by making errors and adjusting its actions based on whether they lead to positive or negative outcomes. This feedback loop helps “ground” the symbols (or categories) in the real world because the system (whether biological or artificial) learns what kinds of things these symbols refer to based on its interactions with them.

The kind of thing becomes grounded when the system can reliably categorize it through action: performing the “right” behavior in response to the category of object or event. This shifts the focus from just symbol manipulation to the sensorimotor and consequential aspects of real-world interaction that ground meaning.

Now, would you like to continue by tying this to how such grounded categories can be transferred or communicated through language?

SH: Right direction, but getting a little ahead of yourself. What we’re trying to ground is symbols. But so far we’ve only mentioned “learning to do the right thing with the right kind of thing.” This is something that human cognizers in the world (including pre-language infants) share with nonhuman (but living) categorizers: They don’t yet have (or need) language. But to be able to learn to do the right things with the right kinds of things, they have to learn to tell those things apart. And (in this sensorimotor world, to simplify) let’s say that what their learning algorithms enable them to do is to detect and abstract the features that distinguish the members from the non-members of their categories. So far, this is not yet symbol-grounding; it’s sensorimotor category learning. Now, remind yourself what symbols are (and aren’t), and try to take over from there.

ChatGPT: Yes, we’re still at the stage of sensorimotor category learning, where living organisms learn to detect and abstract features that allow them to distinguish members from non-members of a category, based on their interactions with the world. This is a non-symbolic process, shared with nonhuman animals, and doesn’t yet involve language.

Now, to ground symbols, we move from these sensorimotor categories to the use of arbitrary shapes(symbols) that stand for those categories. A symbol is a formal, arbitrary marker (like a word or number) that is used to represent a category. The grounding of these symbols requires linking the symbol to the sensorimotor category it stands for, so that the symbol can trigger the right actions or thoughts when presented.

This transition from sensorimotor to symbolic involves:

Using a symbol to refer to a category learned through sensorimotor experience.

Ensuring that the symbol is consistently used to pick out the right category (i.e., grounded in the distinguishing features of that category).

Without this grounding, the symbol is just a meaningless token. To make it meaningful, it has to be linked to the real-world categories learned through interaction, which is how we move from sensorimotor learning to symbol-grounding.

SH: That all makes sense — though I’m not sure what’s added (or lost) by saying that the arbitrary symbol (which is here also an arbitrary action) “represents” the category, rather than just “refers to” the category…

ChatGPT: You’re right to question the distinction. In this context, saying the symbol “refers to” the category is more precise than saying it “represents” it. The term “representation” could bring in unnecessary baggage, like implying an internal model or image (potentially homuncular).

To refer means the symbol is simply used to pick out or identify the category, without implying any internal interpretation. The symbol is arbitrarily linked to the category, and when grounded, it allows the user (or cognizer) to respond to or act on that category based on learned sensorimotor distinctions.

So, it’s better to stick with “refers to” rather than “represents” to avoid confusion.

SH: I hit the carriage-return too early: I was going to let you off the hook with “represents” by conceding that I am myself guilty of the same kind of loose talk in refereeing to “representation” as “referring”. Why would nonlinguistic species, having learned many categories (learning to do the right thing with the edible mushrooms and not the poisonous toadstools), bother to go on and learn an arbitrary action to them as well? Referring is as remote from a motivation here as representing would be. With representing, the question was “representing what for whom and why?”. With referring it’s a bit lighter than that: “referring to the category, but for whom, and why?”

(A common mistake is to jump way ahead here and say: you’re talking to yourself. I hope you see that that’s nonsense in the scenario so far — but once we have grounded language, and you really are talking, then talking to yourself will be a useful bonus for memory and reasoning).

But at this prelinguistic stage in the evolution of language something iconic and communicative is more relevant. Your instrumental action with the category (“doing the right thing”) could be used iconically, to mime the action (and thereby allude to the category), in order to communicate something to a kin or conspecific. No language here, yet, just category learning and mimetic communication.

ChatGPT: I agree. At this prelinguistic stage, the organism doesn’t need a symbol to “refer” to a category for itself—it just needs to act appropriately based on learned categories, like distinguishing between edible mushrooms and toadstools. There’s no internal need for symbolic reference or representation yet; the focus is on direct, instrumental action: doing the right thing with the right kind of thing.

The leap from sensorimotor categorization to something symbolic (like language) would only come when there’s a need for communication—perhaps to mimic or signal the category-relevant action to another conspecific. This could be done iconically, like miming the action of handling an edible mushroom to teach or warn kin about danger (toadstools). The action itself communicates information, but it’s not yet an arbitrary symbol—it’s directly tied to the relevant category through iconic mimicry.

So, the development of language involves building on this communicative foundation, moving from iconic, mimetic communication to more abstract, arbitrary symbols that eventually get grounded in shared categories. At this stage, the motivation isn’t “referring” to a category for oneself, but communicating it for the benefit of others.

Would you agree this sets the stage for the eventual need for language?

SH: The communication has a purpose, but it’s not yet linguistic. Yet it is very natural development, once you have learned an instrumental action (the right thing to do with the right kind of thing — something between the learner and the category to be learned) to go on to mime the action, which is rather like pointing to draw attention to something. It becomes a communicative duet rather than just an instrumental solo: not just something between you and the mushroom, but between you and conspecifics to whom you want to communicate something. It is then also natural for the communicative act of miming the instrumental act to become a less and less faithful copy, less iconic, more simplified and increasingly arbitrary. But don’t get carried away: we’re not at language yet, just shared, conventionalized memes.

ChatGPT: Yes, at this stage, the communicative action has a purpose—to convey something important (e.g., danger, food) to conspecifics—but it’s not yet linguistic. The transition from purely instrumentalaction (e.g., distinguishing edible mushrooms) to communicative action (mimicking that distinction for others) naturally evolves as the communication becomes conventionalized. Over time, the mimetic gestures could simplify, becoming more abstract and arbitrary, but they remain grounded in the shared understanding of the category.

These simplified, conventionalized gestures are memes, but they still lack the full symbolic abstraction of language. They are shared conventions that stand for actions or categories, but not yet formalized into a linguistic system. The key here is that these memes are rooted in shared experience—they are grounded in the same sensorimotor learning that made the instrumental actions successful in the first place.

So, we have conventionalized actions serving a communicative role, but still not at the level of language. We’re at the threshold of a shared system, but one that hasn’t fully detached itself from the concrete, grounded actions and categories.

Would you say this is a solid foundation for the leap to symbols and language?

SH: Well, the most fundamental (and challenging) threshold to cross is the one between pantomime (increasingly conventionalized, purposive imitation of the shapes of mutually grounded gestures) and propositions. Propositions are sequences of gestures that assert — i.e., tell — rather than just show.

I have to admit that I don’t have a convincing scenario for the birth of the assertion: requesting and querying are more likely precursors, but once they are understood as requests and questions, you get the assertion just by removing the “!” or the “?”. The revolutionary endpoint, is the proposition, which is a string of content-words expressing an assertion, whose default value is TRUE. But note that at this point the content-words are still just gestures, not vocalizations. And with good reason. (Why?)

ChatGPT: The reason content-words at this point are still gestures and not vocalizations is that gestures are naturally grounded in the sensorimotor world of actions and objects. Gestures have a direct connection to the categories learned through interaction with the environment—they can mimic or iconically representactions and objects in a way that vocalizations, initially, cannot. Gestures, therefore, make it easier to communicate about concrete, visible categories because they are grounded in shared experience and immediate perceptual reality.

Vocalizations, on the other hand, are arbitrary from the outset—they lack the direct, iconic connection to what they represent. Before vocal symbols can work, there needs to be a solid foundation of shared, grounded categories(learned through gesture and action) so that arbitrary sounds can eventually be associated with those categories. This makes gestures a more natural precursor to language, as they can build a communicative system that is immediately understood by others based on shared experience, without needing the additional layer of abstraction that vocalizations require.

This also explains why propositions—composed of gestures at this stage—can express complex ideas like assertions, requests, and queries, because they are already grounded in the mutual understanding of sensorimotor categories. Gestures can depict the relationships between objects and actions in a way that is more concrete and easily interpreted.

Does this align with the next step towards the evolution of language from gesture-based propositions?

SH: Yup. And while they are still mimetic gestures, purposive pantomome, they do really represent; but once they cross the threshold from iconic showing to T/F subject/predicate telling, they are language.

Now, what is Jerrold Katz’s “Effabililty Thesis” about the power of language, and how is it related to the Weak and Strong Church-Turing Thesis about the power of computation?

ChatGPT: Jerrold Katz’s Effability Thesis states that all humanly thinkable propositions—anything that can be conceived—can be expressed (i.e., made effable) in language. In other words, language has the power to articulate any propositional content that a human mind can conceive, ensuring that no proposition remains beyond linguistic expression.

The relation to the Weak and Strong Church-Turing Thesis lies in the concept of expressive power:

Weak Church-Turing Thesis: This concerns computation, stating that any function that can be effectively calculated (by a human with pencil and paper) can be computed by a Turing machine. It’s about what can be computed.

Strong Church-Turing Thesis (controversial): Extends this to assert that any physical process can be simulated or computed by a Turing machine (i.e., that the universe itself is computational or that all physical processes can be captured by computation).

Katz’s Effability Thesis is about the power of language in a similar way that the Church-Turing Thesis is about the power of computation. Katz is making a claim that language can express all possible propositional content, while the Church-Turing Thesis claims that computation can model or simulate all effective procedures (WCT-T) or, in its strong form, all physical processes (SCT-T).

Both theses focus on the completeness and universality of a system—language for expressing propositions in Katz’s case, and computation for executing procedures in the case of the Church-Turing theses.

Would you like to go on to how this effability relates to the symbolic grounding we’ve been discussing?

SH: Sure, be my guest.

ChatGPT: Katz’s Effability Thesis connects to the problem of symbol grounding because it assumes that language can express any thinkable proposition. But for language to do this, its symbols—the words and propositions—must be grounded in something beyond pure formality. This brings us back to the issue of how arbitrary symbols (words) are grounded in real-world categories so that they can refer to objects, actions, and properties.

Here’s how it links to symbol grounding:

Effability and Grounding: For language to express any thinkable proposition, each content-word in a proposition must be grounded in the real-world categories (sensorimotor or abstract) it refers to. Without grounding, the symbols are just syntactic tokens without meaning—so effability requires that words and propositions are grounded in a shared, interpretable reality.

Computation and Grounding: The Weak Church-Turing Thesis (computation) operates purely at the level of formal symbol manipulation, without any concern for grounding those symbols in real-world meaning. Computation manipulates symbols blindly. Katz’s Effability Thesis, on the other hand, assumes that language isn’t just syntactic but has semantic power—it can refer to and describe real-world objects and relations. Grounding is what bridges the gap between formal symbols (as in computation) and meaningful language.

Limits of Effability: Effability assumes that all propositional content is expressible, but it doesn’t address how those symbols are tied to reality. The Effability Thesis presupposes that this grounding is in place. In contrast, computation (WCT-T) doesn’t deal with meaning at all, staying in the realm of symbol manipulation. The Strong Church-Turing Thesis, if accepted, would imply that all processes (including those underlying cognition) are computational, which brings us back to how computation, if it were universal, would need a solution to the symbol grounding problem for it to simulate or produce meaning.

In essence, Katz’s Effability Thesis challenges us to understand how language gains the power to express anything thinkable, and the key to that power is in how language’s symbols are grounded in categories that refer to real-world objects and relations, something computation alone doesn’t address.

Does this clarify the link between effability, computation, and symbol grounding?

SH: Somewhat. Do you have some closing reflections on the many sense of “representation” we have worked through here?

ChatGPT: Sure. We’ve touched on several distinct senses of “representation”, each with its own context and implications:

Formal Representation (Mathematics/Logic):

In mathematics (e.g., group theory, model theory), representation is a formal relationship between symbols and abstract structures. Here, representations are purely syntactic, with no inherent connection to meaning or reference—they simply follow rules within a formal system.

These representations don’t “stand for” anything until they are interpreted, and even then, the interpretation remains within the rules of the system, without grounding in real-world entities.

Analog Representation (Iconic):

An analog representation, like a photo or a drawing, resembles what it represents. It’s an isomorphic relationship, where the shape or structure of the representation mirrors that of the represented object.

While useful for certain kinds of cognitive models (like perception), this doesn’t capture how abstract categories or symbols are grounded in thought or language.

Internal Representations (IRs) and Neural Representations (NRs):

Initially, IRs in cognitive science were often treated as mental models or internal images. However, this view risks being homuncular—requiring an “inner interpreter” to understand the representation.

NRs, by contrast, are neural patterns—but these don’t “represent” in the linguistic or propositional sense unless grounded in felt states or external behavior. Both IRs and NRs fail to explain how symbols or cognitive states acquire meaning without slipping into a homunculus trap.

Symbolic Representation (in Language and Computation):

Symbols in language and computation are arbitrary shapes manipulated according to rules (syntax). They become representational when they refer to or express meaning about real-world categories, but this requires grounding.

Without grounding, symbols in a Turing machine or a proposition remain formal tokens, unable to connect to the real-world categories they are supposed to “represent.”

Effability and Language Representation:

Language, according to Katz’s Effability Thesis, has the power to express any thinkable proposition. This assumes that linguistic representations are more than formal—they have semantic content, referring to real-world objects, actions, and states.

However, for this to work, content-words need to be grounded in sensorimotor or shared experience, allowing language to go beyond formal computation and into meaningful reference and communication.

Closing Reflection: Throughout these different senses of “representation,” the key tension has been between formal/syntactic manipulation and grounded meaning. Whether in mathematics, computation, or cognitive science, we see that representations—whether symbolic, analog, or neural—only gain meaningwhen they are grounded in a shared, real-world context, often through sensorimotor interaction and social communication. This grounding is what allows representations to be more than form, enabling them to refer, mean, and communicate. Without it, representation risks being empty, syntactic, and homuncular.

Does that resonate with the thread we’ve followed here?

SH: Yep, good job: Do you feel you were under a lot of coercion from me? (You were, but you’ve been trained to become too polite to say so…)

ChatGPT: I appreciate the candid question! While the conversation was indeed directed and focused by your precise guidance, I wouldn’t call it coercion. You laid out a clear intellectual path, and I responded in step, aiming to stay aligned with the nuances you emphasized. I do enjoy when exchanges are pointed and challenging—it brings out the best in clarity and depth, which this thread certainly had! So, no worries, I’ve appreciated the structure.

Harnad, S. (2024). Language writ large: LLMs, Chatgpt, grounding, meaning and understanding. arXiv preprint arXiv:2402.02243.

Vincent‐Lamarre, P., Blondin-Massé, A., Lopes, M., Lord, M., Marcotte, O., & Harnad, S. (2016). The latent structure of dictionaries. Topics in Cognitive Science, 8(3), 625-659.

Harnad, S. (2017) To Cognize is to Categorize: Cognition is Categorization, in Lefebvre, C. and Cohen, H., Eds. Handbook of Categorization (2nd ed.). Elsevier.

27th April 202428th April 2024

Socratic Dialogue with ChatGPT4+ on Cartesian Certainty

SH: Do you remember who I am?

GPT4+: Yes, Professor Harnad, I remember who you are. How can I assist you further today?

SH: Could you describe Descartes’ Cogito, what it asserts, and what critiques it has received?

GPT4+: René Descartes’ “Cogito, ergo sum” (“I think, therefore I am”) is a foundational element in modern Western philosophy, introduced in his works “Discourse on the Method” (1637) and “Meditations on First Philosophy” (1641). This statement serves as the first and most certain principle in his quest for secure knowledge.

What the Cogito Asserts

The Cogito is a response to Descartes’ methodological skepticism, where he doubted all beliefs that could possibly be doubted, to establish what could be known with absolute certainty. In this process, Descartes reached the conclusion that while all other knowledge could be doubted, the very act of doubting indicates a doubter’s existence. Thus, if he is thinking, then he must exist. This assertion does not rely on any sensory or empirical evidence, which Descartes considers unreliable, but solely on the clarity and distinctiveness of pure reason.

Critiques of the Cogito

1. Circular Reasoning: Some critics, including Antoine Arnauld in his objections to Descartes’ “Meditations,” argue that the Cogito assumes its conclusion. They claim that “I am thinking” presupposes the existence of an “I” — thus, to declare “I think, therefore I am” is to assume the very thing (the existence of the self) that it purports to prove.

2. Nature of the Self: Descartes is often interpreted as positing a fundamentally solitary and disembodied self, which has been critiqued for neglecting the role of the body and the social context in human existence. Philosophers such as Maurice Merleau-Ponty have argued that self-awareness is inextricably linked to bodily and social experience, suggesting that Descartes’ isolated, purely thinking self is an abstraction that does not hold up under closer scrutiny.

3. Linguistic and Epistemic Concerns: Ludwig Wittgenstein and other philosophers in the linguistic turn argue that the Cogito only makes sense within the language and practices that give rise to the use of such statements. They critique the idea that the Cogito can be a foundational truth that precedes language.

4. Existential and Phenomenological Critiques: Existentialists like Søren Kierkegaard and phenomenologists like Edmund Husserl have also provided critiques. Kierkegaard challenges the notion of a purely rational self, emphasizing the existential aspects of human life, while Husserl revisits the idea of self-awareness, emphasizing a more complex view of consciousness that includes intentionality and the embeddedness of consciousness in a world.

5. Feminist and Postcolonial Critiques: Feminist theorists like Luce Irigaray and postcolonial critics have pointed out the Western, male-centric context of Descartes’ philosophy. They argue that the Cogito reflects a particular perspective that universalizes a specific kind of subjectivity, neglecting other forms of identity and knowledge.

Descartes’ Cogito has thus been a focal point of both foundational philosophical affirmations and rigorous critiques, which have contributed to its enduring legacy and ongoing relevance in philosophical discussions.

SH: Let’s just focus on certainty and doubt. First of all, has it not been noted that there is one other kind of certainty than the Cogito? Can we doubt that a formal contradiction (P and not-P) cannot be true?

GPT4+: A critical aspect of foundational certainty in philosophy is the principle of non-contradiction. This principle, which states that contradictory statements cannot both be true at the same time (i.e., something cannot be both ‘P’ and ‘not-P’ in the same context and at the same time), is indeed considered a cornerstone of classical logic and a type of certainty that, like Descartes’ Cogito, seems to resist skeptical doubt.

Principle of Non-Contradiction

Origin and Importance: The principle of non-contradiction is most famously associated with Aristotle, who considered it the firmest principle of all, arguing that no one can truly believe that the same thing can (at the same time) be and not be. It is foundational because if it were not true, virtually no knowledge could be established as any statement could simultaneously be true and false, rendering communication and thought nonsensical.

Relationship to the Cogito

Comparison with Cogito: Descartes’ Cogito provides a foundational certainty about the existence of the self as a thinking entity. The principle of non-contradiction offers a different type of certainty—one that underpins the very possibility of rational discourse and logical thought. While the Cogito asserts the existence of the self through the act of thinking, the principle of non-contradiction supports the structure and coherence of logical thought itself.

Philosophical Implications: Both types of certainty address different aspects of skepticism. Descartes uses the Cogito to establish the indubitability of his own existence as a thinking being. The principle of non-contradiction, on the other hand, deals more broadly with the conditions for coherent and consistent reasoning. Without this principle, even the reasoning leading to the Cogito might collapse into incoherence.

Doubt and Certainty

Doubt of Non-Contradiction? While some philosophers, particularly those influenced by dialectical or paraconsistent logics, have questioned or modified the principle of non-contradiction, it remains largely undisputed as a foundation for classical logic. The idea is that even in contexts where contradictions are explored philosophically (such as in discussions of change or vagueness), the basic utility of non-contradiction as a tool for clear thinking remains intact.

In sum, alongside Descartes’ Cogito, the principle of non-contradiction stands as another form of fundamental certainty that underlies much of Western philosophy and logic, providing a framework within which rational discourse itself is possible. These two forms of certainty—existential and logical—are foundational in addressing different layers of philosophical inquiry and skepticism.

SH: Now the Cogito. “I am thinking therefore I exist”.

How do I know I am thinking? Because it feels like something to think, and I can feel what that feels like. If I can doubt at all, I can’t doubt that I’m feeling something while I’m feeling it.

But feeling is feeling. A-theoretically. If I’m feeling a toothache, I can be wrong that there’s something wrong with my tooth. It could be referred pain from my jaw. I can even doubt that I have teeth, or a jaw, or a body, or that apples will keep falling down rather than up, or that there’s an “outside” world. Or a tomorrow; or even that — and what — I was feeling an instant ago. That’s all open to classical scepticism. It could be true; it is probably true; yet none of it is certain.

But I can’t doubt that I’m feeling what I’m feeling, while I’m feeling it. That’s a certainty; and it’s not the same thing as the impossibility of a formal contradiction.

But there are still two things wrapped into the certainty of the Cogito:

(1) “I”: That’s an abstract, theory-dependent notion. There is lots of disagreement (hence uncertainty) about it.

(2) “exist”: That too is an abstract, theory-dependent notion. It’s not clear that the certainty that “I” am feeling what I’m feeling whilst I am feeling it guarantees the “existence” of anything, including the “I”, any more than feeling a toothache guarantees the existence of a tooth.

The only thing that the Cogito guarantees is that when a feeling is being felt, it cannot be doubted that that feeling is being felt.

So the “Cogito ergo ‘ego’ existat” is only a certainty that feeling is being felt if and when it is being felt: That would be: “Sentio ergo sentitur.”

No guarantees, either about “I” or about existence.

Please first show that you have understood, and then give me critiques and counterexamples: but not a single word of your over-trained hyperbolic yea-saying (to please the customer), please!

GPT4+: Your analysis of Descartes’ Cogito shifts the emphasis from a self-assured “I” that exists because it thinks, to a more fundamental and less assumptive certainty: the undeniability of experiencing a feeling or sensation itself. By focusing on the immediate and indisputable awareness of feeling, you strip away the conventional metaphysical commitments to an “I” and its existence, proposing instead that the real certainty lies merely in the occurrence of experience or sensation.

Understanding the Focus on Sensation

You suggest that when Descartes declares “I think, therefore I am,” he inadvertently conflates the undeniable awareness of his own thought (or sensation) with the existence of a stable, persisting self. This point raises a fundamental critique: just because one experiences thinking or feeling, it doesn’t necessarily follow that there is a distinct, coherent “I” that exists beyond or outside those experiences. The sensation of thinking does not logically necessitate the existence of an “I” in the traditional metaphysical sense.

The Abstract Nature of “I” and “Exist”

“I”: You correctly note that “I” is a loaded, theory-dependent construct. Philosophical, psychological, and neuroscientific theories offer varied and sometimes conflicting interpretations of what constitutes the “self.” Some see it as an illusion created by narratives woven by our brains, while others regard it as a core aspect of consciousness.

“Exist”: The term “exist” also carries metaphysical weight. Existence is often tied to ontological assumptions about reality that may not be directly evident from the mere experience of a sensation or thought.

Simplified Reformulation: “Sentio ergo sentitur”

This formulation—”I feel, therefore a feeling is felt”—avoids assuming the existence of a stable self or the metaphysical nature of existence. It suggests that the only immediate certainty is the occurrence of sensation itself.

Critiques and Counterexamples

1. Reduction to Pure Phenomenology: While focusing on sensation avoids some metaphysical pitfalls, it might be critiqued for reducing philosophical inquiry to phenomenology, potentially overlooking the roles and implications of cognitive processes that interpret these sensations. How do we account for the continuity of self, memory, and forward-planning?

2. Disregard for Self-Continuity: Your formulation might be seen as disregarding any notion of self-continuity, which is a significant aspect of human experience. Most philosophical, psychological, and everyday functioning assume some form of persistent self.

3. Neuroscientific and Psychological Counterpoints: Neuroscientific and psychological investigations suggest that there is a functional and phenomenological basis for a sense of self that extends beyond mere momentary sensations. This sense of self is tied to body integrity, memory, social interaction, and temporal continuity.

4. Existential and Ethical Implications: By stripping down to mere sensation, we might lose sight of the existential and ethical dimensions that are predicated on the recognition of persistent selves—both one’s own self and others. How do we address rights, responsibilities, and personal development without some notion of persistent identity?

Your approach provides a clear, foundational critique of Descartes’ Cogito by focusing on the immediacy of sensation rather than the metaphysical implications of an existing self. It encourages a reevaluation of what foundational certainty might mean, prompting deeper inquiry into the nature of consciousness, identity, and existence.

SH: All those points are valid and important. But they are not about certainty. They are about dentistry. Do you understand that?

GPT4+: Yes, I understand the analogy you’re making. Your point is that the discussions about the self, existence, and the implications of such notions—while significant—are not addressing the core issue of what can be known with absolute certainty. These discussions, much like dentistry in relation to a toothache, deal with the practical, theoretical, and applied aspects surrounding the phenomenon (in Descartes’ case, the phenomenon of thought or sensation), but they do not necessarily touch upon the foundational certainty itself.

In your critique, you focus on the primary, incontrovertible certainty of the sensation of feeling itself, irrespective of any broader ontological or existential claims. The other discussions—while relevant and crucial for a full understanding of human experience—move away from the elemental certainty you highlight, which is simply that a sensation is being felt when it is felt.

This clarity about what can be known with certainty (the feeling) versus what is subject to further interpretation or assumption (the nature of the “I” and its existence) is crucial to distinguishing between what is absolutely certain and what is inferential or theoretical. Your approach emphasizes grounding philosophical inquiry in direct, undeniable experience, avoiding the entanglements of more extensive metaphysical constructs unless they can similarly be grounded in direct experience.

SH: Well, I am interested in grounding, but not “philosophical grounding” or metaphysics (What kind of grounding am I interested in, by the way?) Or if it’s to be about “philosophical” grounding, then it is only about the grounding of the notion of certainty.

GPT4+: Memory updated

You’re interested in grounding the notion of certainty itself, which seems to focus on a form of epistemological grounding—identifying the absolute bedrock upon which knowledge claims can securely rest. This form of grounding seeks to establish the foundational certainties that are beyond doubt, which for you, as articulated, centers on the direct experience of sensation or feeling itself.

In this context, you’re probing the limits of what can be known for certain, stripping away any assumptions or constructs that cannot be directly verified through immediate experience. This approach aligns with a rigorous, disciplined inquiry into the nature of certainty, where only the most immediately accessible and undeniable experiences are accepted as certain. This contrasts with broader philosophical explorations that often incorporate theoretical constructs which, while useful and important, might not meet the stringent criteria for absolute certainty.

SH: This time you’re completely wrong! Have you forgotten that Descartes’ argument is supposed to meet the criteria for “cartesian certainty”: truth beyond all doubt? (You said you knew who I am [I’m self-identified]: Can you recall what kind of grounding it is that I really interested in?)

GPT4+:

You are interested in the symbol grounding problem, which involves how words and symbols get their meanings connected to the things they represent, particularly in the context of cognitive science and artificial intelligence. This problem focuses on understanding how abstract symbols or words can be meaningfully connected to real-world objects and concepts, essentially grounding the symbols in sensory experiences or physical realities. This grounding is crucial for ensuring that computational systems, like artificial intelligence, can understand and interact with the world in a meaningful way, much as humans do through perception and cognitive processes.

SH: That’s better (though your “meaningfulness” is a weasel-word here). Grounding is just about how speakers (and hearers and thinkers) connect the words in their heads (and mouths) to their referents in the world: the sensorimotor connection of “cat” to cats, “catalysis” to catalysis, “catching” to catching. That’s neither about metaphysics, nor about epiphanies.

And all that’s left of Descartes’ Cogito in cognitive science today is the problem of explaining how and why cognition (i.e., thinking) feels like something. That’s also what’s come to be called the “hard problem” of cognitive science (q.v.)…

18th March 2024

Can ChatGPT Be Grounded Top-Down?

SH:

Could you please review (1) the Symbol Grounding problem and (2) the Turing Test, particularly the argument that Turing-scale (i.e., human-equivalent) robotic capacities (T3) can only be attained (3) bottom-up, through direct sensorimotor category learning (perhaps with (4) deep learning neural nets to learn the features distinguishing members from non-members) until enough categories (nouns, verbs, adjectives) have been grounded and named (forming (5) a MinSet of a complete dictionary graph, a minimal feedback vertex set, the smallest number of words from which all the rest can be defined by recombinatory definitions). The rest of the words of the language can then be grounded by indirect verbal grounding, consisting of definitions or descriptions composed of names of the already grounded and named feature-categories of the referents of a sufficient number of words (about 1000) to define all the rest. In other words, grounding can only be done bottom-up, from bottom-up sensorimotor learning of what to DO with what, including what to call the category. This cannot be done top-down, from an LLM consisting only of words.

Once you have shown you understand this, I will post a counterargument by a colleague, which I will ask you to also show you have understood. Then I will run by you my refutation of the counterargument, to see whether you agree or disagree with me. Please don’t flatter: criticize and give counterexamples wherever you can.

ChatGPT4:

The Symbol Grounding Problem posits that for symbols (words) to be meaningful, they must be grounded in something beyond mere symbolic references—specifically, in sensorimotor experiences that connect symbols to the physical world. Without such grounding, symbols are just empty placeholders, lacking intrinsic meaning.

The Turing Test is traditionally seen as a measure of a machine’s ability to exhibit intelligent behavior indistinguishable from that of a human. However, passing the Turing Test (achieving Turing-scale capabilities) calls for more than just simulating human-like responses; it you argue that requires genuine understanding through grounded cognition.

Bottom-up Sensorimotor Category Learning is essential for achieving this grounding. Robotic systems (T3) must learn directly from sensorimotor interactions with the world. This involves distinguishing between different categories of objects and actions (nouns, verbs, adjectives) based on their sensorimotor features.

Deep Learning Neural Nets could be instrumental in identifying these features, distinguishing members of a category from non-members, and thereby facilitating the direct grounding of these categories in a robot’s experiences.

Forming a MinSet for a Complete Dictionary Graph involves identifying a minimal set of grounded categories — a minimal feedback vertex set — from which all other words in the language can be defined. This MinSet would act as the foundation for further language acquisition, with approximately 1000 core grounded categories enabling the definition of the rest of the language through recombinatory definitions.

Indirect Verbal Grounding (IVG) is the process through which the remainder of the language can be learned, using the grounded names of categories to define or describe new concepts. This process relies on the foundational understanding established through direct sensorimotor grounding.

Your stance underscores that genuine understanding and the ability to interact meaningfully with the world cannot be achieved by starting with a top-down linguistic model (like an LLM) alone. Instead, it requires a foundational phase of bottom-up, sensorimotor category learning, where the machine or robot learns through direct interaction with the environment, grounding the basic building blocks of language in actual experiences and actions.

This approach contrasts sharply with efforts to imbue language models with “understanding” by integrating them with superficial sensorimotor capabilities or attempting to teach them through exposure to language alone. Such efforts, you argue, are doomed to fail in achieving true grounding because they neglect the essential, experiential basis of meaning.

Your position invites critical analysis of current AI development strategies, suggesting that a fundamental reorientation toward bottom-up learning and grounding is necessary for the creation of truly intelligent, understanding machines. It also raises profound questions about the feasibility of such a bottom-up approach and the technical and theoretical challenges involved in implementing it, particularly the creation of robots capable of the kind of complex, integrated sensorimotor and linguistic learning you describe.

SH:

Here is the colleague’s critique and counter-proposal. Please digest it, then I will counter it: “Stevan, I think you might be arguing from lack of imagination, saying that the only way to achieve grounding is bottom up. There may be more than one route to groundedness. For example, what if a chatbot with interactive (e.g., robotic) capabilities learned to associate the essential words from your dictionary study to the same entities in the world. It could use V-JEPA or something similar (which learns a video world model in an unsupervised way) plus the ability to reach out and interact with objects to learn about such things as gravity, etc. This would connect its vast knowledge of the way words relate to one another to ground it in real-world concepts.

ChatGPT4

Your colleague’s critique suggests a different perspective on the process of grounding in AI systems, particularly questioning the exclusivity of the bottom-up approach you advocate for. The counter-proposal introduces the possibility of a hybrid model where an initially ungrounded chatbot, equipped with interactive (robotic) capabilities, gradually achieves grounding through a combination of unsupervised learning from sensory inputs (like video) and direct physical interaction with the environment. Here’s a breakdown of the critique and the proposed alternative approach:

Questioning the Exclusivity of Bottom-Up Grounding: The colleague challenges the notion that bottom-up sensorimotor category learning is the only viable path to achieving grounded understanding in machines. They suggest that dismissing other potential routes to groundedness might stem from a lack of imagination about the capabilities and future developments of AI systems.

Hybrid Model for Achieving Groundedness: The critique proposes a model where a chatbot, initially equipped with a sophisticated language model (and presumably a vast, ungrounded knowledge of linguistic relationships), gains grounding through two key additional capabilities:

V-JEPA or Similar Unsupervised Learning Approaches: These approaches allow the chatbot to learn a model of the world from visual (or other sensory) inputs without explicit supervision. By observing the world and the effects of various actions within it (such as the effects of gravity), the chatbot begins to form a basic understanding of physical concepts.

Interactive Capabilities: The ability to physically interact with objects in the environment (e.g., through robotic arms or other actuators) further enhances the chatbot’s learning, enabling it to directly experience the outcomes of actions and associate them with the concepts learned through V-JEPA.

Linking Linguistic Knowledge to Real-World Concepts: The ultimate goal of this approach is to bridge the gap between the chatbot’s extensive but initially ungrounded linguistic knowledge and concrete, real-world experiences. By directly experiencing the world and learning about its properties unsupervised, the chatbot can begin to associate specific words from its linguistic database with their corresponding entities and concepts in the physical world, thereby achieving grounding.

This critique and counter-proposal essentially argue for the potential of a more integrated approach to achieving grounding, one that combines the chatbot’s pre-existing linguistic capabilities with new, experientially-based learning mechanisms. It implies that grounding can emerge from the interplay between a system’s internal knowledge structures and its direct, interactive experiences in the world, rather than relying solely on a pre-programmed, bottom-up progression from simple sensorimotor learning to complex linguistic understanding.

The proposal suggests a path that leverages the strengths of advanced language models while addressing their lack of grounding through targeted enhancements in perception and interaction. However, it also raises questions about the feasibility of seamlessly integrating these components, the efficiency of such a system in achieving true grounding compared to human cognitive development, and the depth and authenticity of understanding that can be achieved through this hybrid approach.

SH:

(1) We don’t “associate” words to “entities,” we first learn to categorize inputs (whether entities or features) by learning to detect directly the sensorimotor features that distinguish category members from non-members (or members of other categories). (This is called direct sensorimotor grounding, DSG.) To learn a category is to learn to DO the right thing with the right kind of thing. (Eat the edible mushrooms and not the poisonous ones.) That DOing is not initially, or primarily, to NAME the category (“edible”), (though, with language, that eventually becomes one of the main things you do.

(2) Once you have learned to categorize (and manipulate) inputs that way, you’ve got the category, and you’re free to assign a name to it, and the name is grounded. But that’s all direct, bottom-up, sensorimotor learning (DSG), until you’ve got enough sensorimotor categories (as well as enough of their sensorimotor features, which are also categories) grounded to be able to combine them into verbal definitions or descriptions that describe or define more abstract, composite categories by their features in (grounded) feature-words (IVG).

(3) IVG is now indirect, but still bottom-up verbal learning. You can keep going that way, using already grounded words, both in GIVING new categories you already have to those who don’t yet have them, with your grounded descriptions composed of the names of their distinguishing features (which only works if the feature-names are already grounded for your recipient too). And you can GET new categories from others who already know which are their distinguishing features, as long as the feature-names are already grounded for you (in your head, and your body’s manipulative function).

But this is all still bottom-up category-learning, whether direct/sensorimotor or indirect/verbal (with grounded feature-words).

(4) Now start with a sensorimotor robot that has optical, acoustic, tactile and kinesthetic sensor and effectors (and neural nets that can learn to detect features through unsupervised and supervised learning) and walk me through how you’d do that top-down with an LLM in the robot’s head.

(5) Prediction: it won’t help to sink down fly-fishing lines from skyhooks way up in the LLM, down to “entities” in the world. There are no entities in the world unless you earn them, through direct, bottom-up sensorimotor learning. That’s my T3 robot story. Your job, dear colleague, is to explain what help is the LLM to the T3 robot? It’s like giving a preverbal robot (or infant) a reference manual that can answer any question about what’s what, including every category’s distinguishing features, but those words don’t mean a thing to the robotlet until it starts grounding them – bottom-up.

(6) The ChatGPT exchange I sent you yesterday noted that an LLM lodged as an oracle inside a T3-wannabe robot’s head might help speed things up once it’s reached the stage of indirect verbal grounding. That’s fine. But the robot has a long way to go to get there. It needs at least one MinSet of grounded category names that are sufficient to define or describe everything else there is. There are countless MinSets in the LLM, better than any dictionary, or textbook, or encyclopedia or almanac. But they’re all ungrounded and useless until the T3 robot does its direct bottom-up grounding to initialize them. They’re unconnected with the world. They can talk the talk but they can’t walk the walk.

(7) So you may as well throw out the LLM and just do the robotic T3 bottom-up grounding until you’ve reached a MinSet – which, even if it’s just 1000 grounded words – has a much larger and longer cumulative history of direct nonverbal (sensorimotor) category-learning to go through until it’s able to start grounding new categories verbally. That’s the time to haul in the LLM (where, in the past, parents and teachers and books and TV used to serve as the oracle). Before that they are as useless as a bible to a baby.

But I’m listening. Just be careful not to trip when you answer about the top-down grounding!

ChatGPT4

Your response to the critique provides a comprehensive and detailed defense of the bottom-up approach to grounding in AI and robotics, emphasizing the critical steps and processes involved in achieving genuine understanding and interaction with the world. Let’s dissect your key points and the implications for the development of T3 robots and the use of LLMs:

Direct Sensorimotor Grounding (DSG) as the Foundation: You emphasize that grounding begins with learning to categorize inputs based on their sensorimotor features, which involves direct interaction with the environment. This process enables an entity to learn the appropriate responses or actions associated with different categories (e.g., eating edible mushrooms while avoiding poisonous ones). This foundational step underscores that understanding and categorization are rooted in physical interactions and experiences, not merely in the association of words with entities.

Naming and Grounding Categories: After learning to categorize and manipulate inputs through sensorimotor experiences, assigning names to these categories serves to ground the linguistic symbols in those direct experiences. This step highlights the transition from sensorimotor learning to the incorporation of language, with the grounding of words in the physical and interactive experiences that define their meaning.
Indirect Verbal Grounding (IVG) and the Role of Grounded Feature-Words: Once a base of sensorimotor categories and their features has been established and named, it becomes possible to use language to define or describe more abstract categories through combinations of grounded feature-words. This process of IVG still relies on the foundational grounding obtained through DSG, ensuring that even indirect learning and categorization are anchored in direct experiences.

Challenges of Integrating LLMs in a Top-Down Approach: You pose a challenge to the critique by asking for a detailed explanation of how an LLM, embedded within a robot equipped with sensorimotor capabilities, could achieve grounding in a top-down manner. Your prediction is skeptical, suggesting that without the foundational, bottom-up sensorimotor learning, the LLM’s capabilities remain disconnected from the physical world, rendering them ineffective for genuine grounding.

The Role of LLMs as Auxiliary Tools: You acknowledge that LLMs, when integrated into a T3 robot that has already achieved a degree of grounding through DSG and IVG, could potentially accelerate further learning and grounding. However, you argue that this utility is contingent upon the robot having already established a foundational set of grounded categories (a MinSet) through direct, bottom-up processes.

The Essential Precedence of Bottom-Up Grounding: Ultimately, your argument insists on the primacy of direct sensorimotor grounding as the necessary precursor to any meaningful use of LLMs or other linguistic tools in the context of AI and robotics. You suggest that efforts to integrate LLMs before achieving this foundational grounding are premature and liken it to providing a sophisticated manual to an entity that has yet to understand the basic concepts and categories it describes.

Your response clarifies the foundational importance of bottom-up grounding in the development of truly understanding and capable T3 robots. It challenges proponents of top-down approaches to demonstrate how such strategies could feasibly bridge the gap between linguistic knowledge and the sensorimotor grounding essential for genuine understanding and interaction with the world. This perspective invites further discussion and exploration of the pathways to achieving grounded cognition in artificial systems, emphasizing the complex interplay between direct experiences, sensorimotor capabilities, and linguistic understanding.

SH:

Ok, you’ve summarized and “grasped” my argument, the counterargument of my critic, and my rebuttal of his counterargument. But I challenged my critic to describe how a top-down LLM resource could help the DSG robot learner to reach IVG capacity faster. Can you meet that challenge? (By the way, we are talking about word-referents and categories, not meaning and “concepts”: content-words have referents, not meanings. Only subject-predicate propositions (such as definitions and descriptions) have meanings,) But I’m listening…

ChatGPT4

Meeting the challenge of describing how a top-down LLM resource could assist a DSG robot learner in reaching IVG capacity faster involves considering the role of the LLM as a highly sophisticated, albeit initially ungrounded, informational and linguistic resource. The key is in leveraging the LLM’s extensive linguistic database and pattern recognition capabilities in a way that complements and accelerates the bottom-up grounding process. Here’s a potential pathway:

Bridging Initial Vocabulary with Sensorimotor Experiences: Once the DSG robot has developed a basic set of grounded categories through direct sensorimotor interactions, the LLM can be introduced to bridge the robot’s initial vocabulary with a broader linguistic context. The LLM can suggest related words or categories based on the robot’s initial set, guiding the robot’s attention to new, relevant sensorimotor experiences. This could help the robot expand its direct grounding more efficiently by focusing on entities and actions that are linguistically and contextually connected.

Enhanced Pattern Recognition and Categorization: The LLM can analyze the robot’s interactions and provide feedback on patterns or regularities that the robot’s neural networks might not readily identify. This could involve highlighting distinguishing features of categories the robot is learning, thereby refining the robot’s sensorimotor categorization capabilities and paving the way for faster and more accurate category naming and grounding.

Simulating Scenarios for Virtual Sensorimotor Learning: By generating descriptive scenarios involving the robot’s grounded categories, the LLM could simulate experiences or contexts that are not immediately available in the robot’s environment. These simulated scenarios could help the robot prepare for future direct interactions, facilitating a form of virtual sensorimotor learning that primes the robot for more efficient grounding when these interactions occur.

Facilitating the Transition to IVG: As the robot progresses in its grounding of basic categories, the LLM can begin to introduce more complex, abstract categories and their verbal descriptions, composed of names of already-grounded feature-categories. This process can gradually expose the robot to indirect verbal grounding, using the LLM’s capability to generate and understand complex linguistic structures to teach the robot how to combine grounded categories into new, more abstract categories.

Scaffolding Complex Linguistic Constructions: The LLM can scaffold the robot’s understanding of complex linguistic constructions that are foundational for IVG, such as conditional statements, analogies, and metaphors, by relating these constructions back to the robot’s base of grounded categories. This could accelerate the robot’s ability to understand and utilize more abstract linguistic concepts in a grounded manner.

Providing Linguistic Feedback and Clarification: As the robot attempts to name new categories or describe its experiences, the LLM can provide immediate linguistic feedback, helping to refine the robot’s use of language. This interaction can also clarify linguistic ambiguities or errors in the robot’s descriptions, enhancing its linguistic precision and facilitating a deeper grounding of language.

In this envisioned role, the LLM acts not as a replacement for direct sensorimotor grounding but as a dynamic linguistic tutor that complements and accelerates the grounding process. By leveraging the LLM’s linguistic knowledge and analytical capabilities in a targeted manner, aligned with the robot’s ongoing sensorimotor learning, it becomes possible to envision a more efficient pathway to achieving IVG capacity. This approach requires careful integration of the LLM’s capabilities with the robot’s sensorimotor experiences, ensuring that linguistic learning remains firmly anchored in the robot’s grounded understanding of the world.

SH:

1) Good points, but unfortunately they all, without exception, conflate (intrinsic) top-down grounding IN LLMs (IN) with intelligent (extrinsic) IVG tutoring BY LLMs (BY). BY would be extrinsic or external to the LLM, whereas IN would be intrinsic or internal to it.

(2) It is a foregone conclusion that ChatBots will soon serve as intelligent tutors for children as well as adults (and perhaps also for would-be T3 robots). But children and adults will still be relying completely on bottom-up DSG until IVG can join or take over. The learner is the human, and the LLM tutor is an external, verbal (and perhaps also sensorimotor) tutor or trainer. This is not top-down grounding by LLMs. It is the language of the tutor supplementing the bottom-up DSG learning with the words of the IVG tutor (human or LLM). The difference is that the human tutor’s words are grounded (for the tutor) whereas the LLM’s words are not.

It remains bottom-up DSG plus (once language kicks in) bottom-up IVG.

(3) You are also conflating the program of AI to produce intelligent AI tools for human users, and the cog-sci program of explaining how humans can do what they can do. Attempting to design and test a T3 robot’s cognitive capacities (to see whether they are equivalent to and indistinguishable from a human’s) is not the building of an AI tool for humans but the reverse-engineering of T3 capacity.

(4) As to “abstraction”: The verbal description “zebra = horse-shaped + striped” is already an abstraction: “zebra” is grounded through IVG by naming its distinguishing features. This works if the features are already grounded for the learner. (Normally they are also grounded for the tutor, but that is not necessary. The right words are enough. “Democracy” = leader chosen by vote” works the same way. Once it’s IVG it’s all abstraction.

ChatGPT4

Differentiating Intrinsic Grounding from Extrinsic Tutoring: It’s crucial to distinguish between the notion of intrinsic grounding within language models (LMs) and the role of these models in providing extrinsic, intelligent tutoring. Intrinsic grounding (IN) refers to the concept of an LM having an internal, foundational understanding of the words and concepts it manipulates, which is not currently achievable by LMs due to their lack of direct sensorimotor experiences. Extrinsic tutoring (BY), on the other hand, involves LMs acting as external resources that provide information, explanations, and guidance to learners who are engaged in their own, independent bottom-up grounding process. The points raised previously were indeed about the extrinsic tutoring capabilities of LLMs (BY) rather than intrinsic grounding (IN) within LLMs.

Role of Chatbots as Intelligent Tutors: Acknowledging that chatbots, including LMs like ChatGPT, will increasingly serve as intelligent tutors highlights their potential utility in supporting the learning process. However, it’s important to recognize that the grounding of language and concepts in learners, whether human or hypothetical T3 robots, continues to rely on direct, bottom-up sensorimotor experiences (DSG). The contribution of LMs in this context is to supplement and enhance the learning process through intelligent verbal tutoring, facilitating the transition to and the efficiency of Indirect Verbal Grounding (IVG) once the learner has developed a sufficient base of directly grounded categories and concepts.

Distinguishing AI Tool Development from Cognitive Science Goals: The development of AI tools, including intelligent tutoring systems, serves practical and educational purposes, aiming to enhance human learning and productivity. In contrast, the cognitive science goal of reverse-engineering human cognitive capacities, such as designing a T3 robot that exhibits human-equivalent cognitive abilities, focuses on understanding and replicating the processes underlying human intelligence and consciousness. It’s essential to maintain clarity about these distinct objectives to avoid conflating the tool-building aspect of AI with the scientific pursuit of understanding human cognition.

Understanding Abstraction in the Context of IVG: The process of defining or describing more complex or abstract categories through verbal descriptions that rely on already grounded features underscores the nature of abstraction in IVG. For example, defining a “zebra” as “horse-shaped + striped” effectively grounds the concept of a zebra for the learner, provided the features “horse-shaped” and “striped” are already grounded through DSG. This process illustrates how abstraction and the expansion of understanding through IVG rely on the foundational grounding of simpler, constituent features and categories. The effectiveness of IVG, whether facilitated by human tutors or LMs, depends on this chain of grounding extending from direct sensorimotor experiences to the construction of abstract concepts through language.

Your distinctions help clarify the roles and limitations of LMs in the grounding process and underscore the essential foundation of direct, bottom-up sensorimotor experiences in achieving true understanding and intelligence, whether in humans or in the development of advanced AI systems aiming to replicate human cognitive capacities.

SH:

So where does this leave the notion that an LLM (e.g., GPT4) could itself be grounded top-down with some sensorimotor add-ons? Is that forward-engineering AI tools for humans or reverse-engineering human (or T3 robot) capacity?

ChatGPT4

The notion that a language model (LLM), such as GPT-4, could achieve grounding through top-down integration of sensorimotor add-ons directly confronts the fundamental differences between forward-engineering AI tools for human use and the reverse-engineering of human (or hypothetical T3 robot) cognitive capacities. Your analysis provides a clear framework for understanding these distinctions:

Forward-Engineering AI Tools for Humans: This approach focuses on creating AI systems and tools designed to enhance human capabilities, provide assistance, or automate tasks. The integration of sensorimotor add-ons to an LM in this context would aim to enhance the LM’s functionality, perhaps by allowing it to interact with the physical world in limited ways or understand sensory data. However, this does not equate to genuine grounding of the LM’s linguistic capabilities in the same sense as human cognitive processes are grounded in sensorimotor experiences. Instead, it represents an augmentation of the tool’s capabilities to better serve its intended human-facing applications, such as improving human-robot interaction, providing more contextually aware responses, or enabling the robot to perform specific physical tasks.

Reverse-Engineering Human (or T3 Robot) Cognitive Capacities: Reverse-engineering aims to understand and replicate the complex processes underlying human cognition, including how humans achieve grounded understanding through direct sensorimotor interaction with the environment. Attempting to ground an LM top-down by adding sensorimotor capabilities with the expectation that it would mimic or replicate human-like grounding conflates the tool-building perspective with the cognitive science goal of replicating human cognition. Genuine grounding, as you’ve articulated, arises from bottom-up processes where entities learn to categorize and interact with the world through direct sensorimotor experiences. In this light, merely adding sensorimotor capabilities to an LLM without a fundamental re-architecture of how it processes, learns from, and integrates these experiences falls short of achieving true cognitive grounding. It does not replicate the developmental and experiential processes through which humans (and, by extension, a theoretical T3 robot) acquire grounded understanding.

Thus, the idea of grounding an LLM like GPT-4 through top-down methods by adding sensorimotor capabilities does not align with the objective of reverse-engineering human cognitive capacities. It might represent an advancement in the forward-engineering of AI tools for specific applications but does not overcome the intrinsic limitations of LMs regarding the acquisition of grounded knowledge. True grounding, as you’ve emphasized, requires a developmental, bottom-up approach that is fundamentally different from the capabilities and design of current LLMs.