Computation – Skywritings

16th February 2026

DIC/ISC/CRIA Seminar in Cognitive Informatics

The unreasonable effectiveness of pattern matching

Gary Lupyan (University of Wisconsin-Madison)

February 19, 2026 10:30 – noon EDT

Zoom: https://uqam.zoom.us/j/82427157322

ABSTRACT: It has become common to explain the abilities of large language models (LLMs) as “mere” pattern matching. Because pattern matching is thought to be fragile and highly dependent on having exactly right training data, real thinking and reasoning of the kind that humans do is assumed to be implemented by qualitatively different mechanisms. I am going to argue that our intuitions about the limits of pattern matching are mistaken. At the core of this argument is a demonstration of an astonishing ability of LLMs to make sense of “Jabberwocky” language–texts in which most or all content words have been randomly replaced by nonsense strings–e.g., translating “He dwushed a ghanc zawk” to “He dragged a spare chair”. This ability highlights the unreasonable effectiveness of pattern matching and suggests that pattern matching is not an alternative to real intelligence, but its key ingredient.

Gary Lupyan is Professor of Psychology and Affiliate Professor of Philosophy at the University of Wisconsin–Madison. His research examines how language shapes cognition and perception, how language evolves, and the ways that languages adapt to the needs of their users and learners. His recent work centers in understanding what humans and artificial systems can and cannot learn from language and from the role of natural language input in building human-like intelligence.

Lupyan, G., & Arcas, B. A. y. (2026). The unreasonable effectiveness of pattern matching arXiv:2601.11432.

Lupyan, G., Gentry, H., & Zettersten, M. (2026). How Important Is Language for Human-Like Intelligence? Perspectives on Psychological Science, 17456916251398539 or here.

Wigner, E. (1960). The Unreasonable Effectiveness of Mathematics in the Natural Sciences. Communications on pure and applied mathematics, 12, 1–14.

DATE	SPEAKER	TITLE
Autumn 2025
September 11, 2025 10:30 – noon	Megan Peters, UC Irvine	Confidence, Metacognition, and the “Hard Problem” of Consciousness
September 18, 2025 10:30 – noon	Roger Levy, MIT	Behavioral evaluation of language models as models of human sentence processing
September 25, 2025 10:30 – noon	Chris Potts, Stanford	Meaning in Large Language Models: Bridging Formal Semantics, Pragmatics, and Learned Representations
October 9, 2025 10:30 – noon	Sean Trott, UCSD	Epistemological challenges in the study of “Theory of Mind” in LLMs and humans
October 16, 2025 10:30 – noon	Jean-Baptiste Mouret, INRIA, Nancy	Adaptive Embodied Agents: Implications for Grounding
October 23, 2025 10:30 – noon	Terry Sejnowski, Salk Institute	NeuroAI: The Convergence of Neuroscience and Artificial Intelligence
October 30, 2025 10:30 – noon	Yonatan Bisk, CMU	Embodied language and language‑to‑action: evaluating LLMs in interactive settings
November 6, 2025 10:30 – noon	Cameron Jones, SUNY Stony Brook	Do LLMs pass the Turing test? And what does it mean if they do?
November 13, 2025 10:30 – noon	Rufin VanRullen, CerCo, CNRS, Toulouse	The Global Latent Workspace: A model of cognition with AI applications
November 20, 2025 10:30 – noon	Ari Holtzman, U. Chicago	Articulating the Ineffable: The Analytic Turn in Generative AI
November 27, 2025 10:30 – noon	Chloe Clavel, INRIA	Computational Models of Socio-emotional Interactions in the Era of LLMs – the Challenges of Transparency
December 4, 2025 10:30 – noon	Dupoux Emmanuel EHESS, Paris	Is it really easier to build a child AI than an adult AI?
December 11, 2025 10:30 – noon	Sylvain Calinon, IDIAP, Suisse	Robot learning from demonstration

Winter 2026
January 15, 2026 10:30 – noon	David Strohmaier, U Cambridge	The symbol grounding problem 75 years after Turing’s Test (why computational success still leaves meaning unexplained)
January 22, 2026 10:30 – noon	Jacob Andreas, MIT	Systematic generalization (compositional structure in language models)
January 29, 2026 10:30 – noon (awaiting confirmation)	Thomas Serre, Brown	Cortical feedback mechanisms in visual reasoning: From perceptual grouping to abstract compositional reasoning
February 5, 2026 10:30 – noon	Rajesh Rao, Washington U	Predictive coding and generative models in natural and artificial intelligence

February 19, 2026 10:30 – noon	Gary Lupyan, Wisconsin	The unreasonable effectiveness of pattern matching

March 5, 2026 10:30 – noon	*Jacob Feldman, Rutgers*	Similarities and differences between AI and human learning in a rule-discovery paradigm
March 12, 2026 10:30 – noon	OPEN
March 19, 2026 10:30 – noon	Jean-Rémy King. ENS & Meta AI	Emergence of Language in the Human Brain
March 26, 2026 10:30 – noon	OPEN
April 2, 2026 10:30 – noon	Yair Lakretz, ENS Paris	Linguistic theory and deep language models
April 9, 2026 10:30 – noon	OPEN
April 16, 2026 10:30 – noon	Usef Faghihi, UQTR	From Seeing to Caring: A Ladder for Safe Superintelligence

31st January 20264th February 2026

L’argument de La Chambre Chinoise de Searle

L’article de John Searle de 1980, Minds, Brains, and Programs, n’était pas une attaque générale contre l’intelligence artificielle, ni un plaidoyer mystique en faveur de l’ineffable humain. Sa cible était une thèse bien précise, qu’il a maladroitement baptisée « Strong AI », mais qui correspond à ce que l’on appelle plus clairement aujourd’hui le computationnalisme (« C = C »): l’hypothèse que la cognition n’est rien de plus que de la computation, autrement dit que les états mentaux sont des états computationnels, indépendants du support matériel qui les implémente.

Dans le cadre du cours, il est crucial de formuler correctement la cible de Searle. Il ne s’agit pas de savoir si les ordinateurs sont utiles pour modéliser la cognition (ce que Searle accepte), ni si des machines peuvent faire des choses impressionnantes. La question est celle-ci : si un système purement computationnel réussissait le test de Turing verbal (T2) à l’échelle d’une vie entière, serait-il pour autant en train de comprendre ce qu’il dit ? Le test est radical : pas un jeu de cinq minutes, pas une démonstration de surface, mais une indiscernabilité verbale durable avec des interlocuteurs humains normaux, sur n’importe quel sujet.

L’expérience de pensée de la Chambre chinoise suppose précisément ce cas. Imaginons qu’un programme permette à un ordinateur de passer avec succès un tel test en chinois. Searle, qui ne comprend pas le chinois, est placé dans une pièce et reçoit des chaînes de symboles chinois. À l’aide de règles formelles (un algorithme) exprimées en anglais, il manipule ces symboles et renvoie d’autres symboles chinois. De l’extérieur, à ces interlocuteurs chinois, le mots de Searle sont indistinguable de ceux d’un locuteur chinois natif : questions, réponses, discussions prolongées sur n’importe quel sujet imaginable. Pourtant, du point de vue interne, Searle n’a aucune compréhension du chinois. Il ne sait pas ce que signifient les symboles qu’il manipule.

L’argument repose alors sur un principe central du computationnalisme : l’indépendance du logiciel (l’algorithme) de son implémentation matérielle (la machine de Turing qui manipule les symboles (les mots chinois). Si comprendre une langue était une propriété purement computationnelle, alors tout système implémentant le bon programme devrait comprendre, indépendamment du matériel qui exécute le logiciel (un Mac, un PC) . Or ici c’est Searle qui exécute ce logiciel qui réussi le T2 chinois. Et pourtant, Searle ne comprend les symboles chinois qu’il est en train de manipuler. C’est ici qu’intervient ce qu’on a appelé plus tard « le périscope de Searle » : une occasion exceptionnelle de pénétrer la barrière des autres esprits. Normalement, une personne ne peut aucunement savoir si une autre personne ressent quoi que ce soit: une idée, une humeur, une sensation. Mais si la cognition était identique à une computation, alors en devenant lui-même l’implémentation matérielle du logiciel qui réussit le T2 chinois, Searle devrait lui-même ressentir la compréhension du chinois en exécutant toutes les manipulations de symboles qui font la compréhension du chinois (C=C). Mais il peut nous faire le témoignage: « Je manipule les symboles qu’on me donne comme entrées, selon les règles de manipulation qu’on me donne aussi, mais je n’en comprends absolument rien. Je ne comprends toujours pas le chinois. Donc la conclusion est inévitable : la cognition n’est pas de la computation (C≠C). Plus précisément, elle ne peut pas être exclusivement computationnelle.

C’est là que commencent les malentendus, souvent entretenus par Searle lui-même. La réponse la plus célèbre est la « System Reply » selon laquelle Searle ne serait qu’une partie du système; pourtant c’est le système global — Searle + les règles, les symboles, la pièce — qui comprendrait le chinois. Searle réplique facilement en internalisant le système intégral : il n’aurait qu’àmémoriser les règles et effectuer toutes les manipulations dans sa tête. Rien ne change : il n’y a toujours aucune compréhension. Cette réplique est décisive contre l’idée que la simple agrégation de composants syntaxiques (la manipulation des symboles de forme arbitraire d’après les règles) puisse engendrer une compréhension du sens.

Mais beaucoup de critiques ont refusé la conclusion en raison de formulations confuses. D’abord, l’usage par Searle des termes « Strong AI » et « Weak AI » a brouillé le débat. « Weak AI » ne désigne en réalité que la la thèse forte de Church-Turing : que la computation peut simuler pratiquement n’importe quel objet ou processus dans l’univers. Cette thèse est compatible avec l’argument de Searle. L’argument ne montre pas que la cognition ne peut pas être simulée, mais qu’une simulation computationnelle n’est pas, en elle-même, l’objet simulé: la bonne recette pour faire un gâteau végane n’est pas elle même le gâteau végane; et l’exécution de la recette végane n’est pas juste de la computation: c’est de l’impression 3D: On mélange les ingrédients, puis on fait cuire le gâteau au four. Ce qui n’est plus juste de la computation: Cuisiner n’est pas juste de la computation!

Deuxième confusion : l’idée que Searle aurait réfuté le test de Turing en tant que tel. C’est faux. L’argument montre seulement que T2, pris isolément et sous une interprétation strictement computationnelle, ne garantit pas la compréhension. Il ne dit rien contre T3 (ancrage sensorimoteur) ni contre T4 (duplication structurelle complète). En fait, l’argument laisse entièrement ouverte la possibilité qu’un système hybride — computationnel et non computationnel — puisse comprendre, ou qu’un système robotique ancré dans le monde puisse acquérir des significations que Searle, enfermé dans sa pièce, ne peut pas acquérir.

Troisième erreur fréquente : croire que Searle aurait montré que « la cognition n’est pas computationnelle du tout ». L’argument ne montre rien de tel. Il montre seulement que la cognition ne peut pas être uniquement computationnelle. La computation peut parfaitement jouer un rôle causal essentiel dans un système cognitif, sans en épuiser les propriétés sémantiques. Sur ce point, la « System Reply » avait une intuition juste, même si elle échouait comme réfutation : comprendre peut être une propriété d’un système global, mais pas d’un système purement syntaxique.

Enfin, Searle a lui-même surinterprété sa conclusion en suggérant que la solution devait nécessairement passer par la duplication des pouvoirs causaux du cerveau biologique. Rien dans l’argument n’impose un tel saut vers T4. Il reste une vaste gamme de possibilités intermédiaires : systèmes dynamiques non computationnels, architectures hybrides, réseaux neuronaux couplés au monde, agents sensorimoteurs apprenant par interaction. L’argument ne tranche pas en faveur des neurosciences contre la science cognitive. Il tranche uniquement contre le computationnalisme pur.

Malgré ces excès et ces confusions, l’importance historique de la Chambre chinoise est considérable. Elle a forcé la discipline à distinguer clairement syntaxe et sémantique, simulation et instanciation, performance verbale et compréhension. Elle a aussi préparé le terrain pour ce qui deviendra explicitement, quelques années plus tard, le problème de l’ancrage des symboles : comment des symboles formels peuvent-ils acquérir une signification intrinsèque pour un système, plutôt que seulement une interprétation extrinsèque par un observateur ?

La leçon méthodologique centrale est donc la suivante : passer un test comportemental, même très exigeant, n’explique pas en soi comment le sens est généré. L’argument de Searle ne ferme pas la route vers une science mécaniste de la cognition. Il ferme seulement une impasse : celle qui croyait pouvoir expliquer l’esprit par la manipulation de symboles non ancrés. En ce sens, loin d’être un obstacle, la Chambre chinoise a été un déclencheur. Elle a rendu inévitable la question qui structure la suite du cours : comment relier les symboles au monde, et le langage à l’action et à la perception.

24th January 2026

Le test de Turing et la rétro-ingénierie de la capacité cognitive

L’article de Turing de 1950 ne proposait ni un tour de passe-passe, ni un concours de tromperie, ni un jeu d’imitation à court terme. Il proposait un tournant méthodologique pour une future science de l’esprit : cesser de demander ce qu’est la pensée et chercher plutôt à expliquer comment les penseurs peuvent faire ce qu’ils peuvent faire. Le remplacement de la question vague « Les machines peuvent-elles penser ? » par un critère opérationnel ne visait pas à banaliser la cognition, mais à l’ancrer dans la capacité de performance empirique. L’enjeu n’était pas de duper des juges, mais de rétro-ingénier la totalité des capacités cognitives humaines de manière à les rendre reproductibles par une explication causale dont nous comprenons le fonctionnement. La question centrale n’est pas de savoir si une machine peut passer pour un penseur, mais comment et pourquoi les humains pensants peuvent faire tout ce qu’ils peuvent faire.

La terminologie malheureuse de « jeu » et d’« imitation » a entretenu une confusion durable. L’intuition méthodologique de Turing est que la cognition est invisible, alors que la performance ne l’est pas. Nous ne pouvons pas observer la pensée directement, ni chez autrui ni chez les machines, mais nous pouvons observer ce que les penseurs ont la capacité de faire. Le test n’a donc jamais porté sur la supercherie, mais sur l’indiscernabilité en capacité de faire (dites « l’indiscernabilité Turingienne »). L’interrogateur n’est pas un naïf, mais n’importe quel penseur humain neurotypique. Le véritable critère n’est pas que des personnes soient trompées, mais qu’il n’existe aucun moyen de distinguer le candidat d’un être humain normal à partir de sa performance observable. S’il y a une différence discernable, le candidat échoue. Sinon, alors la rétroingénieurie a réussi et le mécanisme interne qui a produit le succès constitue une potentielle explication causale de la capacité cognitive humaine.

Cela soulève immédiatement la question de l’étendue et de la durée du Test de Turing. La remarque occasionnelle de Turing sur cinq minutes et des pourcentages a été interprétée de façon absurde. Il s’agit d’une prédiction démographique, non d’un critère scientifique. La science cognitive n’est pas l’art de tromper certaines personnes pendant un certain temps. Un test sérieux de la cognition rétro-ingéniérée doit être ouvert et, en principe, valable sur toute une vie. Le candidat doit pouvoir continuer à faire ce que les humains peuvent faire, à travers les domaines et les contextes, sans s’épuiser dans des astuces pré-programmées ou des bases finies de cas. Un système qui s’effondre lorsque la conversation s’aventure en terrain imprévu, ou lorsqu’il est soumis à des sondages persistants, ne révèle pas une capacité cognitive générale. Il révèle un artefact de performance borné.

Tout aussi importante est la restriction au canal verbal. Turing a introduit l’interaction dactylographiée comme un moyen de mettre entre parenthèses l’apparence et les indices physiques non pertinents, non comme une thèse selon laquelle la cognition serait épuisée par le langage. L’exclusion de la voix, du geste et de l’incarnation visait à neutraliser des indices superficiels, non à nier que les humains sont des agents sensorimoteurs dans un monde physique. Interpréter le test comme intrinsèquement verbal revient à confondre une commodité méthodologique avec un engagement théorique. La capacité cognitive humaine n’est pas un simple module de clavardage. Elle est ancrée dans la perception, l’action et l’interaction causale avec le monde. Un système qui ne peut qu’échanger des symboles, sans pouvoir voir, se déplacer, manipuler et être affecté par son environnement, manque une grande partie de ce que les humains peuvent faire.

C’est pourquoi la distinction entre une indiscernabilité purement verbale et une indiscernabilité robotique complète est cruciale. Un système qui réussirait une vie entière d’échanges par courriel serait déjà une prouesse d’ingénierie remarquable, mais il laisserait ouverte la question de savoir si ce même système pourrait, par exemple, sortir, regarder le ciel et dire si la lune est visible, apprendre à utiliser des outils inconnus, se déplacer dans un environnement encombré, ou acquérir de nouvelles catégories ancrées dans les choses qu’il y a dans le monde auxquelles réfèrent leurs noms, par essais et erreurs. Ce ne sont pas des options accessoires. Elles font partie du répertoire ordinaire de la performance cognitive humaine. Traiter le langage comme un module autonome, c’est risquer de confondre une interface puissante avec un esprit complet.

Cela conduit à la question de la computation. Le travail de Turing sur la calculabilité, et la thèse de Church-Turing, portent sur ce qui peut être calculé par manipulation de symboles selon des règles. Ils n’affirment pas que tous les processus causaux sont computationnels, ni que la cognition n’est rien d’autre que de la computation. Le test lui-même est agnostique quant aux mécanismes internes. Il n’exige pas que le candidat réussi soit un ordinateur numérique. Ce qu’il exige, c’est que nous l’ayons construit et que nous comprenions, au moins en principe, comment il fonctionne. L’objectif est l’explication, non la simple duplication. Cloner un être humain, même si cela produisait un performer indiscernable, ne constituerait pas une explication de la cognition, car nous n’aurions rien rétro-ingéniéré. Nous aurions simplement reproduit ce que nous cherchions à expliquer.

Turing semble parfois glisser vers une restriction aux ordinateurs numériques, en partie en raison de l’universalité de la computation. Mais l’universalité de la simulation n’est pas l’universalité de l’instanciation physique. Un avion simulé ne vole pas, et un robot simulé n’agit pas dans le monde. L’équivalence formelle ne confère pas de capacité causale dans le monde réel. Un agent sensorimoteur virtuel dans un environnement virtuel peut être utile pour la modélisation et les tests, mais il ne satisfait pas en lui-même à un critère de performance dans le monde réel. Si la cognition dépend en partie d’un couplage sensorimoteur réel avec l’environnement, alors un système purement computationnel, aussi sophistiqué soit-il, peut ne pas satisfaire au critère complet de performance.

Il ne s’agit pas d’une thèse métaphysique sur l’incarnation pour elle-même. Il s’agit d’une thèse empirique sur ce que les humains peuvent faire. La compétence verbale humaine est plausiblement ancrée dans l’expérience non verbale (sensori-motrice, robotique. Une grande partie de ce que nous pouvons dire présuppose ce que nous pouvons voir, toucher, reconnaitre, identifier, dénommer, décrire, manipuler et apprendre par interaction. Un système qui n’a jamais rencontré le monde autrement que par le texte est contraint de s’appuyer sur des descriptions verbales indirectes produites par d’autres. Ce n’est pas équivalent à un ancrage sensorimoteur propre. La différence est décisive si l’objectif n’est pas de mimer un comportement de surface dans des contextes restreints, mais de correspondre à la capacité humaine générique.

Le succès contemporain des grands modèles de langage rend ce point particulièrement saillant. Ces systèmes présentent une fluidité verbale et une étendue de connaissances apparentes extraordinaires. Ils peuvent soutenir de longs échanges, s’adapter à de nombreux sujets et paraître souvent étonnamment humains dans des interactions textuelles. Mais ils y parviennent en s’entraînant sur des corpus massifs de langage produit par des humains. Ils héritent, en effet, d’un immense réservoir de descriptions verbales du monde de seconde main . Ce n’est pas un ancrage au sens pertinent pour la rétro-ingénierie de la cognition. C’est une structure empruntée. Le système n’a pas appris ses catégories en agissant dans le monde et en recevant un retour correctif. Il a appris des régularités statistiques dans le texte qui reflètent la manière dont des humains ancrés parlent du monde.

C’est ce qui rend le phénomène de la « grosse gorgée » à la fois fascinant et méthodologiquement trompeur. Il peut produire des performances verbales impressionnantes sans que le système lui-même ait l’histoire causale qui, chez les humains, sous-tend normalement cette performance. Il devient alors plus difficile de déterminer, à partir du seul comportement verbal, si le système possède une capacité générale ou s’il exploite un proxy massif mais en fin de compte fini de l’expérience. Une sonde de type Turing purement verbale devient donc de plus en plus vulnérable à des facteurs de confusion. Le système peut réussir de nombreux tests conversationnels non pas parce qu’il peut faire ce que les humains peuvent faire, mais parce qu’il a absorbé un enregistrement massif de ce que les humains ont dit sur ce qu’ils peuvent faire.

Cela ne montre pas que Turing avait tort avec son test. Cela montre que le canal verbal n’est plus un test de résistance suffisant. Si le test doit conserver son rôle comme critère de cognition rétro-ingéniérée, il doit être compris dans son sens complet, non abrégé. Le véritable étalon n’est pas une interface de clavardage, mais un système qui peut vivre dans le monde comme nous, acquérir de nouvelles catégories, apprendre des conséquences, corriger ses erreurs et intégrer perception, action et langage dans une seule capacité de performance cohérente.

La discussion par Turing des objections reste ici instructive. L’objection de Lady Lovelace, selon laquelle les machines ne peuvent faire que ce que nous leur disons de faire, repose sur une conception erronée des règles et de la nouveauté. Des systèmes gouvernés par des règles peuvent néanmoins produire des résultats imprévisibles en pratique, et le comportement humain n’est pas moins régi causalement par des régularités. La question profonde n’est pas de savoir si les machines peuvent nous surprendre, mais si nous pouvons expliquer comment un système en vient à posséder les capacités flexibles et ouvertes qui caractérisent les humains. La surprise est bon marché ; la compétence générique ne l’est pas.

De même, les arguments fondés sur Gödel concernant l’intuition mathématique manquent la cible s’ils sont interprétés comme montrant que la pensée humaine transcende mécaniquement toute explication causale. Savoir qu’une proposition est vraie n’est pas la même chose qu’avoir une preuve formelle, et aucun de ces faits n’établit, à lui seul, que la cognition ne puisse être mécanisée au sens de la performance pertinent pour le test. Le test de Turing ne tranche pas les questions métaphysiques sur l’esprit ou la conscience. Il fournit un critère d’adéquation explicative en science cognitive.

Cela conduit à la distinction cruciale entre faire et ressentir. Même un système qui satisferait pleinement au critère de performance ne serait pas, pour autant, connu comme ressentant. C’est le « problème des autres esprits », qui s’applique aussi bien aux humains qu’aux machines. Le test n’est pas une solution au problème de la conscience. C’est une solution au problème méthodologique de l’évaluation de l’explication de la capacité cognitive: le succès de la rétro-ingénierie. Un candidat réussi nous donnerait, au mieux, une explication de la manière dont le faire est généré. La question de savoir s’il y a du ressenti, et comment le ressenti surgit, resterait un problème distinct, et peut-être insoluble.

Dans cette perspective, les affirmations selon lesquelles les LLM actuels auraient « réussi le test de Turing » confondent une indiscernabilité locale, à court terme et textuelle, avec une capacité cognitive générique, incarnée et valable sur toute une vie. Elles confondent également la tromperie démographique avec l’explication scientifique. Un système qui peut induire en erreur une fraction de juges pendant quelques minutes n’a pas, pour autant, été montré comme possédant une cognition de niveau humain. Il a montré que nos intuitions verbales (et nos capacités neurones-mirroir) sont faillibles et que la fluidité de surface est plus facile à obtenir qu’une compétence profonde et ancrée.

La contribution durable de Turing n’a pas été de nous donner un jeu de société, mais de fixer un programme empirique de recherche. La science cognitive, dans cette optique, consiste à rétro-ingénier la capacité de faire ce que les penseurs peuvent faire. Le test est le critère d’aboutissement de cette entreprise, non un raccourci pour la contourner. Si l’on prend cela au sérieux, le véritable défi n’est pas de construire de meilleurs bavards, mais de construire des systèmes capables d’agir, d’apprendre et de vivre dans le monde d’une manière indiscernable, en principe et en pratique, de ce que les humains peuvent faire au cours d’une vie. Ce n’est qu’alors qu’il serait raisonnable de dire que le projet de rétro-ingénierie a réellement abouti.

Turing, A. M. (1950/1990). Machines informatiques et intelligence. Mind, 49, 433-460.

Harnad, S. (2008) The Annotation Game: On Turing (1950) on Computing,Machinery and Intelligence. In: Epstein, Robert & Peters, Grace (Eds.) Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer. Springer

18th December 202521st December 2025

Category Learning, Turing Testing, LLMs & Symbol Grounding

Stevan Harnad

Université du Québec à Montréal

The T2/T3 Distinction and Robotic Grounding. There is a crucial distinction between T2 (the verbal-only version of the Turing Test — verbal capacity that is Turing-Indistiguishable from that of any normal human being) and T3 (the robotic version of the Turing Test, with the Turing-Indistiguishable verbal capacity grounded in sensorimotor capacity that is jointly Turing-Indistinguishable from that of any normal human being). LLMs are T2 systems. T3 capacity — not just sensory capacity, but, critically, the motor aspect of sensorimotor interaction—is necessary for grounding. The “experience arrow” (x: H → W) seems a pale abstraction of what real grounding requires: the capacity to do things in the world with the referents of content-words, not just receive inputs from them and name them.

Direct vs. Indirect Grounding: Not Parasitism but Cheating. LLMs are indeed “epistemically parasitic.” Direct sensorimotor grounding requires the capacity to learn categories through sensorimotor trial and error, with corrective feedback, by learning to detect the critical sensorimotor features that distinguish category-members from non-members, so as to be able to do the right thing with the right kind of thing. Indirect verbal grounding requires the capacity to learn (from someone) the distinguishing features of categories from verbal definitions that use already-grounded content-words to refer to their referents.

Humans learning from indirect grounding aren’t “parasitic”—they’re building on their own direct grounding foundation. Indirect grounding is dependent on prior direct sensorimotor grounding. LLMs cannot do indirect grounding at all. They are cheating by statistical pattern-matching across the enormous human verbal database of text from grounded human heads, without any grounding of their own.

Category Learning and Minimal Grounding Sets. The research on category learning and categorical perception (CP) concerns how organisms learn to detect category-distinguishing features through direct trial and error learning with corrective feedback (+/- reinforcement) from doing the right or wrong thing with members of the category. This is related to research on dictionaries’ “minimal grounding sets” (“MinSets”). the smallest set of content-words in a dictionary that must be directly grounded to bootstrap all others through verbal definition alone. Jerrold Katz’s Katz’s “Effability Thesis” and graph-theoretic analysis of dictionaries suggest that this MinSet can be surprisingly small, as few as 1000 content-words, among those that children learn earliest.

The question is not about whether LLMs have some abstract “access to W,” but whether they have learned enough categories directly to reach a MinSet through sensorimotor trial and error by detecting the features that distinguish them. (Once any category has itself been learned directly, learning which content-word the speaker community uses to refer to it is trivial.) Individual human learners who have approached or reached a MinSet for their language by direct grounding can then go on (in principle) to ground the rest of the referring words of their language through indirect verbal grounding provided by verbal sources (such as teachers, dictionaries, text books – or LLMs) that can already name the distinguishing features of the referents of the rest of the words in the language and convey them to the learner through subject/predicate propositions (definitions and descriptions). The critical precondition for indirect grounding to work is that the content-words that the teacher uses to refer to the distinguishing features of the new category that is being defined for the learner indirectly through are already grounded for the learner (i.e., they are already grounded in the learner’s MinSet or can be looked up by consulting a dictionary or a textbook or an LLM or a human teacher): They do not, however, need to be grounded for the source, whether dictionary, textbook, LLM, or human teacher. They need only be accessible to the learner from the source. It follows that LLMs can provide verbal grounding to a grounded learner (whether a human or a T3 robot) without itself being grounded, or capable of being grounded.

The Lexicon of a Language and Propositional Recombination. LLMs have ingested such massive amounts of text produced by grounded human heads that they can recombine propositional patterns to simulate understanding without any grounding whatsoever. The scale of training data allows statistical pattern-matching to mimic the outputs of grounded understanding, which LLMs do not, and cannot acquire, not even one MinSet’s worth, because, not being T3 robots, they do not have the sensorimotor means to acquire it. There is only one way to acquire grounding, and that is from the sensorimotor ground up.

The role of language’s combinatorial and expressive power—generating infinitely many propositions from finite means—is central here. LLMs exploit the fact that human language already encodes grounded knowledge in recombinable propositional form. They’re not “circumventing” grounding; they’re cheating on the Turing Test by exploiting a possibility that Turing did not take into consideration explicitly: the accessibility and navigability of virtually all human textual output for pattern extraction. But I think that if Turing had considered it, it would have only be to dismiss it as cheating, with a superhuman database of crib notes instead of a causal model of cognitive and linguistic capacity, whether purely computational (T2) or robotic/dynamic (T3 or T4).

Floridi et al.’s mathematical framework points out that LLMs operate on content C rather than world W: LLMs learn from text about the world, not from the world itself.

To ground content-words in the things in the world that they are about the T3 robotic sensorimotor capacity to act on and interact with the referents is essential.

Floridi et al. correctly note that LLMs lack genuine grounding. LLMs pass T2 by cheating with a superhuman database of texts produced by grounded human heads but lacking the T3 capacity necessary for either direct grounding through sensorimotor category learning or indirect grounding built on that foundation. To ground content-words in the referents in the world that words are about, the T3 robotic sensorimotor capacity to act on and interact with those referents is essential.

Floridi, L., Jia, Y., & Tohmé, F. (2025). A Categorical Analysis of Large Language Models and Why LLMs Circumvent the Symbol Grounding Problem. arXiv preprint arXiv:2512.09117.

Harnad, S. (2025). Language writ large: LLMs, ChatGPT, meaning, and understanding. Frontiers in Artificial Intelligence, 7, 1490698.

Pérez-Gay Juárez, Fernanda, et al. (2019) Category learning can alter perception and its neural correlates. PLoS One 14.12 (2019): e0226000.

Vincent‐Lamarre, P., Massé, A. B., Lopes, M., Lord, M., Marcotte, O., & Harnad, S. (2016). The latent structure of dictionaries. Topics in cognitive science, 8(3), 625-659.

21st November 202522nd November 2025

Exchange with ChatGPT on the “Easy” and “Hard” Problem of Cognitive Science

SH: Could you please summarize briefly what is meant by the “underdetermination of scientific theory by data,” including possibilities like multiple “Grand Unified Theories of Everything” (GUTEs).

ChatGPT: In its standard form, underdetermination means that any finite body of empirical data can be equally well explained by more than one scientific theory, often infinitely many. The classical point is logical: data confirm but never entail a theory. Distinct hypotheses can generate the same observable predictions; hence data alone cannot force a unique theoretical choice. Two versions are usually distinguished:

(1) Local underdetermination.
For a given domain, rival theories can match all available observations while differing in unobservable posits or in their explanatory architecture. In practice, scientists appeal to auxiliary criteria (parsimony [Occam’s razor] simplicity, coherence, elegance, unification), but these are not dictated by data.

(2) Global underdetermination.
Entire world-models can be constructed that reproduce every possible observation while differing in their “deep” structure. Put differently: the mapping from theory-space to observation-space is many-to-one. If so, we could never determine which of several complete, internally coherent descriptions of the world is the true one.

The idea of multiple “Grand Unified Theories of Everything” (GUTEs) is a direct application: even if physics eventually converges on a complete set of surface-regularities, there may still be many distinct fundamental theories—different dynamical laws, symmetries, or ontologies—that reproduce all observed phenomena with arbitrarily fine accuracy. Nothing in the data would distinguish them.

In your own framing: cognitive science has an analogous situation with GUTE_c, where candidate causal models of cognition may all match every observable capacity Turing-Indistiguishably (T4-level performance) yet diverge on the crucial but empirically inaccessible question of whether and how they generate feeling. Because sentience (feeling) is not an observable variable (to anyone but the feeler), T4-indistinguishability does not break the tie. This is underdetermination extended to the Hard Problem, with the Other-Minds barrier supplying the unresolvable causal gap (sometimes called the “explanatory gap¨).

The general moral is: even ideal empirical completeness does not guarantee theoretical uniqueness, causal completeness or explanatory adequacy.

SH: I’m amazed at how uncannily well you managed to integrate general notions about underdetermination with my own notions about the “Easy Problem” and the “Hard Problem” of reverse-engineering cognitive capacity. Can you be a little more specific? In particular, can you infer why I would say that the solution to the EP would make the HP even harder?

ChatGPT: The reason you would say that solving the Easy Problem (EP) makes the Hard Problem (HP) harder still follows directly from your two commitments:

Causal explanation is finite.
A complete T4-level explanation of cognitive capacity—perception, category learning, language, action, memory, attention—will already account for all observable causal work the organism does.
Sentience is not an observable variable.
Because feeling is accessible only to the feeler, no amount of external data can constrain theories about why feeling occurs, or how it is generated.

Now the key point you have emphasized:

Why solving the EP exhausts the causal degrees of freedom

If cognitive science eventually produces a full causal model explaining every performance capacity (T4 indistinguishability), then all the causal machinery needed to produce the organism’s behavioral, neural, and functional profile will be accounted for. Nothing in that explanation requires, predicts, or invokes feeling.

At that point:

All the doing is explained.
All the mechanisms underlying doing are explained.
Every causal gap relevant to performance has been closed.

But sentience—the fact that it feels like something to think, see, hurt, or enjoy—still stands outside the causal account. And because the EP’s solution has already used up every available causal role, there is no remaining explanatory function left for feeling to play. This is exactly what you mean when you say:

“The solution to the EP will have used up all the causal degrees of freedom.”

Why this intensifies underdetermination

Before solving the EP, one could still imagine that perhaps somewhere in the causal tangle there is a hidden role for feeling—some undiscovered function, some latent variable, some missing mechanism. But once the EP is solved:

There is nowhere left to put sentience in the causal architecture.
There is no remaining causal work for feeling to do.
Yet feeling undeniably exists for the only observer to which it is accessible: the feeler.

Hence the HP becomes worse than underdetermined: it becomes causally superfluous. Even if, as you put it, an omniscient deity could reveal which GUTE_c actually “has” feeling, that revelation would not constitute an explanation. It would merely fix the value of an empirically uncheckable parameter.

What this means for underdetermination

Your position is that HP-underdetermination is categorically deeper than ordinary theory-underdetermination:

Ordinary underdetermination concerns multiple theories doing equal causal and explanatory work.
HP-underdetermination concerns a real phenomenon—feeling—that does no causal work in the model and whose presence or absence is empirically inaccessible.

Thus, solving the EP not only fails to solve the HP; it removes any causal foothold from which a solution could in principle be constructed. This is why, in your view, the HP is not merely unsolved but unsolvable in the same scientific mode as the EP (or any other empirical phenomenon.

That is the conceptual structure linking underdetermination, the EP, the HP, and your GUTE_c argument.

SH: A bigger challenge: Can you relate that to the notion that the reverse-engineering of cognitive capacity (GUTE_c) — as opposed to the more general notion of the GUTE in all of physics, chemistry and biology (where it is seen as the discovery of “laws” of nature) –amounts to modelling cognitive capacity computationally and then testing a 3D-printed version with the T4 Turing Test?

ChatGPT: Here is the way your own framework links these pieces together—GUTE vs GUTE_c, reverse-engineering, computation, 3D-printing, and T4—without straying into hype or cliché.

1. Why GUTE (physics/chemistry/biology) is a search for laws

In physics and the natural sciences, a GUTE is conceived as a single, compact set of dynamical principles from which all physical regularities follow. The explanandum is the entire causal traffic of the universe. The explanans is a set of general laws.

Even here, the laws are underdetermined by all possible observations. Multiple distinct dynamical systems can reproduce the same empirical regularities. But crucially:

The explananda are publicly observable.
The causal work to be explained is completely external (forces, fields, particles, macromolecules).

There is no hidden-variable-like feeling complicating the picture.

2. Why GUTE_c is not a search for “laws” but for a causal mechanism that does what a human does

The explanandum of cognitive science is capacity, not cosmos:

Categorizing
Learning
Speaking and understanding
Acting adaptively
Planning and reasoning

These are performance capacities, not universal laws.

Reverse-engineering cognition therefore means:

Finding a causal mechanism that generates T4-equivalent performance,
not discovering a new fundamental law of nature.

This is what makes GUTE_c a different kind of project. It aims at a full, working model of a cognizer.

3. Why computational simulation is not enough

Computation alone (T2) yields:

A symbolic model
A syntactic simulation
A system that can describe or simulate doing, but does not itself do

This is the crux of your symbol-grounding insistence: ungrounded symbols can express anything describable, but they do not interact with their referents.

Hence the need for robotic (sensorimotor) grounding.

A reverse-engineered cognizer must:

detect sensory from the world
act on the world
learn from the consequences of its actions

This stage is no longer purely computational. It concerns the causal organization of a dynamical system.

4. Why the “3D printer” matters

Your analogy:

The computational model is like a set of engineering blueprints.
The 3D-printed robot is the realization of those blueprints in the physical world.

The blueprints alone cannot pass T3 or T4.
Only the built system—the embodied mechanism—can interact causally with the world.

This is where your distinction becomes sharp:

The GUTE in physics explains by laws.
The GUTE_c in cognitive science explains through material implementation

Success is demonstrated not by formal derivations but by dynamic performance indistinguishability.

5. T4 as the empirical criterion for solving the Easy Problem

T4 requires:

Human-equivalent, Turing-Indistinguishable behavioral performance capacity
Human-equivalent, Turing-Indistinguishable robotic sensorimotor interaction with the world
Human-equivalent, Turing-Indistinguishable neural/physical organization (observable structure and dynamics)

Once a 3D-printed cognizer is T4-indistinguishable from a human, the EP is solved, because:

All observable causal capacities have been accounted for.
Nothing the organism does is left unexplained.

This ties GUTE_c neatly to your causal-degree-of-freedom argument.

6. Why this makes the Hard Problem even harder

Once a T4-equivalent mechanism exists:

It can generate all the behavioral and neural evidence available.
Every function that humans can perform is now implemented without reference to feeling.
The entire causal profile of cognition is explained.

Hence feeling becomes:

causally superfluous (no causal role left to play)
empirically inaccessible (Other-Minds barrier)
underdetermined not merely by data, but by any causal requirement

Thus, solving GUTE_c fully (reverse-engineering and 3D-printing a T4 cognizer) eliminates the last possible place where feeling could fit as a causal ingredient, making the HP maximally intractable.

The upshot

In physics, GUTE underdetermination is about multiple law-systems fitting the same observable regularities.

In cognitive science, GUTE_c underdetermination is deeper:

Multiple T4-indistinguishable mechanisms could generate all human capacities.
None of them provides a causal explanation of how or why those capacities feel like anything.
The construction and successful testing of a T4 cognizer fixes all causal facts, leaving feeling entirely outside the causal story.

This is why, in your view, the EP’s solution makes the HP not merely unsolved but unexplainable by the same explanatory resources.

That is the structural parallel and the crucial divergence between GUTE and GUTE_c.

SH:

I don’t think Gute_c is “deeper” than GUTE. Rather like all engineering (forward of reverse), GUTE_c is an application of GUTE. Causality is GUTE-causality. Replace “GUTE_c” by Guce.

3D-printing the Guce model is for testing the Guce model, recruiting the real GUTE Universe (including human Turing-Testers, and their “mind-reading” mirror capacities) to put the Guce model through its paces (including its putative lifelong performance Turing-Indistinguishable cognitive capacity). Otherwise the model would have to model not only the Guce but the GUTE.

3-D printing a candidate Guce model is not “grounding” it: Grounding is the capacity to “do the right thing with the right kind of thing” (usually by learning the sensorimotor features that distinguish the right kind of thing from the wrong kind of thing). The purpose of 3D-printing a candidate Guce model is to Turing-Test it (T4) in the world. (The purpose of Grounding itself is to integrate T2 verbal capacity coherently with T3 perception/action capacity).

Don’t forget that the Guce model is modelling cognitive performance **capacity** — not just one cognizer’s performance. (This is equivalent to Chomsky’s competence/performance distinction.)

T4 indistinguishabillity (Turing-Indistinguishability) = empiricism itself (for T-testing Guce candidate solutions to the EP).

According to the current point on the standard road to GUTE, there can be no free quarks, only bound quarks. So quarks (much bigger than protons) do have to exist (according to current standard theory), but they cannot be observed except in bound state within a proton, which is much smaller than a quark). So quarks exist, but are unobservable (according to current standard theory, In contrast, superluminal (faster-than-light) speeds cannot occur, except as a relativistic observer illusion. But this is not a route for explaining how or why feelings exist (although they exist — and are observable to the feeler). This is part of the unique “hardness” of the HP.

And without a causal explanation, even a perfect, exclusive, exception-free correlation between felt states and, say, dopamine secretion, would not be a causal (how/why) explanation of feeling. Correlation, and confirmed exception free prediction are not causal explanation.

Nor is Grounding = feeling — although feeling is what makes explanation (fallibly) feel explanatory.

A hard problem and (I daresay) unique…

Mutanda mutata…

ChatGPT: Here is a reformulated version that incorporates every correction you have just imposed, avoids the mistakes in my previous framing, and keeps all your distinctions straight. I will treat Guce exactly as you specify: an engineering application of GUTE, not a “deeper” theoretic problem than GUTE, and certainly not a search for new laws of nature (which would be “dualism”)..

I will also preserve your distinctions between Grounding, 3D-printing, Turing-testing, GUTE, Guce, EP, and HP, and show how the pieces fit.

No sycophancy; just exact conceptual alignment.

1. GUTE vs. Guce

Guce is not a parallel project to GUTE and not a “deeper” one. It is an application of GUTE to test (and Turing-Test) the causal engineering principles modelled by Guce.

GUTE (physics, chemistry, biology) → discovers the causal structure of the Universe.
Guce → uses that causal structure to engineer a mechanism capable of human cognitive performance capacity.

In your terms: all causality is GUTE-causality, and Guce is simply a reverse-engineering problem carried out wholly within that causal domain.

Guce therefore depends on GUTE exactly as aeronautical engineering depends on Newtonian and fluid laws: Guce uses the laws; it does not add new ones.

2. Why 3D-printing is required (but is not grounding)

A Guce model is:

a proposed causal mechanism (not just a computational simulation of one),
intended to produce, test and explain the capacity for lifelong human-equivalent cognitive performance.

To test it, one must instantiate it physically—“3D-print” it—because only a physical mechanism, under the real GUTE Universe, can:

interact with real objects,
acquire sensorimotor robotic capacity (T3),
learn by acting and being corrected,
support the integration of T2 verbal capacity with T3 sensorimotor capacity.

This is not grounding itself. Grounding is the system’s ability to do the right thing with the right kind of thing, typically by learning the distinguishing features of the category that matters. 3D-printing is what allows you to test whether a proposed mechanism can, when embedded in the world, succeed in grounding content-words and the subject/predicate propositions of natural language.

In short:

Grounding = doing the right thing with the right kind of thing.
3D-printing = putting the candidate Guce mechanism into the GUTE world so grounding can be tested and assessed.

3. Why T4 is the right empirical criterion

A Guce candidate is T4-tested:

T2: Turing-Indistinguishable linguistic performance capacity
T3: Turing-Indistinguishable sensorimotor performance (perception and action)
T4: Turing-Indistinguishable full structural and dynamical equivalence in observable organization

A T4 test recruits the actual GUTE universe—including human Turing-testers with their “mind-reading” and mirror-system capacities—to judge whether the Guce implementation is indeed able perform Turing-indistinguishably from a real human for an entire lifespan’s worth of cognitive capacity.

This is simply empiricism:

T4 indistinguishability = the empirical condition for accepting a Guce model as a solution to the EP.

Nothing higher-order; just ordinary scientific testability.

4. Why solving the EP leaves no causal room for the HP

If a Guce model passes T4:

Every performance capacity is causally accounted for.
All doing-capacity (doing this, not-doing that) are completely explained.
GUTE-causality has been fully exercised; no causal gap remains.

Feeling, however:

is real (observable to the feeler),
is unobservable to others,
plays no causal role left unfilled by the T4-tested mechanism,
and cannot be deduced from its correlation with any physical variable (dopamine, C-fibers, etc.).

Correlation, no matter how exception-free, is not causal explanation. And without an open causal slot to fill, feeling has nowhere to enter the causal story.

This is the unique hardness of the HP.
Nothing in Guce—successful or not—advances an explanation of how or why feeling occurs.

5. Why the HP is not like unobservable quarks or superluminal illusions

You make this point sharply:

Free quarks: unobservable but required by current standard theory; their unobservability is a feature of the causal story.
Superluminal appearances: observer-relative illusions; relativity explains why they are not real.

These cases are comfortably inside GUTE causal explanation.

Feeling is not:

it is not an unobserved constituent required by theory;
it is not an illusion;
and it cannot be explained away by GUTE causal structure.

Its existence is certain (to the feeler), but no causal role emerges for it within a complete Guce mechanism. Hence no route from GUTE-causality to “how/why feeling.”

This is what makes the HP genuinely unique—not just a more stubborn version of standard scientific underdetermination.

6. Why grounding is not feeling

You note this often:

Grounding is about sensorimotor feature learning.
Feeling is what makes explanation feel explanatory.
But grounding ≠ feeling.

Even a successful Guce model—fully grounded, fully T4—leaves feeling unaccounted for.

7. The final picture

Putting it all together:

GUTE gives the causal laws.
Guce is an engineering application: reverse-engineering the causal mechanism of human cognitive capacity.
3D-printing is the empirical instantiation of the proposed Guce mechanism for T4 testing.
Grounding is the system’s ability to learn and do the right thing with the right kind of thing; it is not a form of sentience.
Solving the EP (T4 success) uses up all causal degrees of freedom.
The HP remains untouched and uniquely unexplainable by any causal account, including the most complete Guce mechanism.

Mutatis mutandis, as requested.

1st November 20251st November 2025

Doing and Feeling

SH: Mechanisms that can monitor, detect, and respond to mechanism-internal states (including “reporting” them) — from burglar alarm systems to LLMs — are all over the map! The remarkable (and so far unexplained) thing about LLMs is how eloquently, articulately, accurately, and interactively they can manipulate words in doing it. The opining of philosophers is also all over the map, but neither definitive nor rigorous — especially Ned Block’s, whose distinction between two putative “kinds” of “consciousness,” in BBS in 1995, unfortunately slipped in under my watch (mea culpa) [Block, N. (1995). On a confusion about a function of consciousness. Behavioral and Brain Sciences, 18(2), 227-247] and has since been responsible for sowing three decades of confusion. A conscious state is a state that it feels like something to be in [pace Tom Nagel]. Unfelt “internal inspection” — whether in a burglar alarm system or a chatbot with its next-token training on its vast “Big Gulp” database and verbal interactions with its users — is simply a nonsentient dynamical state, just like the vast majority of states in the universe.

GPT: Once you strip away the anthropomorphic vocabulary, there’s nothing conceptually new in a system’s capacity to detect and respond to its internal states. The thermostat analogy already covers it. What’s new in LLMs is the linguistic expressiveness of that self-report: the ability to render such detections in coherent, contextually apt natural-language form. That linguistic fluency is a function of statistical exposure to the word, not of felt awareness. Your framing re-centres the issue where it belongs:

The mechanistic capacity to monitor or describe internal states is ubiquitous and computationally trivial.

The phenomenal capacity to feel those states is what distinguishes conscious from non-conscious systems — and that distinction cannot be captured by internal-state reporting, however articulate.

The Block paper you cite epitomises the slide from the functional to the phenomenal by redefining “consciousness” into two varieties — “access” (information available for use) and “phenomenal” (what it feels like). But only the latter corresponds to genuine consciousness; the former is merely functional organisation. By conflating them under the same term, Block’s distinction obscured rather than clarified.

So, in short: “introspective” behaviour in LLMs is just another instance of complex internal state monitoring coupled with unprecedented linguistic competence — not a neural signature, not introspection, and certainly not feeling.

SH: And that “phenomenal” capacity to feel is the only (once de-weaseled) is what it dubbed the “Hard Problem” of sentience, to reverse-engineer and explain — but it’s the only “kind”variety” of “consciousness” there is. (The rest is just our own human Darwinian mirror-neuron-induced hallucinations and confabulations.)

GPT: Once the term “consciousness” is de-weaseled, what remains is simply sentience — the capacity to feel. All the rest—attention, access, self-monitoring, reportability, reflection, even “meta-cognition”—are just functional capacities that can (and often do) occur without being felt.

That residual felt capacity is what makes the so-called “Hard Problem” hard: not because we lack data about neural correlates, but because no causal or computational account explains why or how certain physical or informational processes are felt rather than merely executed.

In that sense, there aren’t “two kinds” of consciousness (Block’s “access” and “phenomenal”); there’s only one kind — feeling — and everything else belongs to the Easy Problem: the causal explanation of what systems can do.

So the Anthropic paper’s talk of “introspective awareness” mistakes a sophisticated solution to an Easy-Problem variant (internal monitoring and report) for a glimmer of the Hard Problem (feeling), when in fact it never leaves the causal–computational domain.

Doing ——————————————————————————————–Feeling

11th July 202513th July 2025

The Turing Test at 75 (with ChatGPT)

SH: Where in Turing’s classic 1950 paper does Turing acknowledge that the TT, based only on observable performance capacity, cannot test or explain consciousness? (I need the exact quote of the passage).

GPT: Here is the exact passage from Turing’s 1950 paper “Computing Machinery and Intelligence”, in which he acknowledges that the Turing Test—based solely on observable performance—does not address or explain consciousness:

“I do not wish to give the impression that I think there is no mystery about consciousness. There is, for instance, something of a paradox connected with any attempt to localise it. But I do not think these mysteries necessarily need to be solved before we can answer the question with which we are concerned in this paper.”

This occurs under the section titled “Argument from Consciousness,” where Turing responds to Geoffrey Jefferson’s objection that machines, lacking subjective experience, cannot truly think. Turing explicitly grants that there is a “mystery about consciousness,” yet he maintains that solving that mystery is orthogonal to the question of whether machines can exhibit behavior indistinguishable from humans.

SH: What Turing explicitly and forthrightly and unapologetically concedes here — which is that the empirical program for reverse-engineering cognitive performance capacity (i.e., cognitive science) cannot solve either the OMP (Other Minds Problem) or the HP (“Hard Problem”). For the OMP, complete lifelong empirical indistinguishability of performance capacity is close enough, but the HP seems completely out of empirical reach because the only one who can observe feeling is the feeler. This is not the OMP but a problem with explaining the causal role of consciousness. Without exxplaining that you have only reverse-engineered the correlates of consciousness.

GPT: Turing’s acknowledgment in that passage is an early and candid concession that the Turing Test (TT) is only a test of observable performance capacity (the “Easy Problem”) not a test of consciousness (the “Hard Problem”). Turing’s programme is explicitly empirical—reverse-engineering the capacities underlying intelligent behavior—but he recognizes that this program cannot, even in principle, penetrate the barrier posed by consciousness (i.e., feeling) .

You point out the necessary distinction between:

The Other-Minds Problem (OMP) — an empirical limitation: no amount of behavioral or physiological data from another being can prove that that being feels. Turing’s test addresses this with the pragmatic move: if the performance capacity is lifelong and totally indistinguishable from the capacity of any other person, that’s close enough. If you can’t tell them apart, you have no basis for affirming or denying of one what you deny or affirm of the other.

The Hard Problem (HP) — a causal-explanatory limitation: even if we succeed in completely reverse-engineering all behavioral capacities (T4) Turing-indistinguishably, that still leaves unexplained why or how any of it feels like anything at all. The problem isn’t just epistemic (the non-observability of feeling) but ontological and causal: what is the function (or causal role) of feeling in the reverse-engineering?

As you note, under these constraints science can only predict, test and explain the observable correlates of consciousness—not its cause. Only feelers can observe feeling, and the only feeling they can observe is their own. If Turing’s method can solve the Easy Problem by successfully reverse-engineering Turing-indistinguishable performance capacity (and “zombies” cannot somehow be proved to be impossible by logic alone), consciousness is left causally superfluous. Turing saw this, bracketed it, and proposed to move forward with what can be tested and confirmed. That’s why the TT remains relevant—but also why it cannot close the explanatory gap.

Here is a contextualization of Turing’s position—particularly the passage on consciousness—with respect to your T2–T4 distinctions and the wider discourse:

Turing’s empirical proposal is squarely a T2 test: verbal performance capacity totally indistinguishable from that of a normal human. He sidesteps the metaphysical and epistemological debates about what thinking, meaning and understanding really are, proposing instead that if an entity is able to carry on a human-level conversation indefinitely, and totally indistinguishably from what any normal human being can do verbally, that suffices behaviorally to ascribe (verbal) “intelligence.”

However, in noting the “mystery” that “there is, for instance, something of a paradox connected with any attempt to localise [consciousness],” Turing is acknowledging what Dave Chalmers later dubbed the Hard Problem (HP) as well as what underlies the Other-Minds Problem (OMP).

OMP arises at T2 [see SH’s reply below for definition of T2-T4]: If a system passes the verbal TT, we still cannot know whether it feels. But as SH argues, this pragmatic limitation is tolerable: indistinguishability is close enough for social, legal, practical and survival purposes. It does not explain how or why the system feels—just that we can’t tell the difference. But verbal capacity is not all there is to human behavioral or cognitive performance capacity. Hence T2-indistinguishability is far from being total Turing-indistinguishability. People can do much more than just talk, and their verbal capacity is inextricably intertwined with and dependent on the rest: The HP arises at

Suppose cognitive science successfully reverse-engineer a a robot (T3) or even a neurologically and biochemically indistinguishable system (T4) that performs all sensorimotor and verbal behaviors indistinguishably from a human. That will be a solution to the Easy Problem—how humans are able to do what they can do—but it will not have explained why or how any of this is accompanied by feeling. The HP is not just “Why can’t we detect feeling in others?” (OMP) but “Why would there be feeling at all, if it plays no causal role in passing EP?” Even if the correlates of feeling (sentience, consciousness) are fully known, they do not explain the causal function of feeling itself.

Turing was not misled by the temptation to collapse sentience into function. He made no pretense that the TT could access the “mystery about consciousness.” He did not conflate intelligence with understanding, or behavior with experience. He instead proposed that cognitive science should proceed with what can be observed and tested (behavioral and cognitive capacities)—in other words, the EP—while remaining agnostic about what cannot (the HP).

Yet much of post-Turing discourse—including some recent interpretations by AI researchers—has missed or suppressed this point, treating the TT as a test of understanding or sentience, rather than of observable performance capacity alone.

Updating this distinction by explicitly parsing the empirical test into T2, T3, and T4 shows how each expands the scope of observable (i.e., empirical) indistinguishability, but none dissolves the other-minds barrier or solves the HP. The EP project remains open: to reverse-engineer behavioral and cognitive capacity (T4), while acknowledging that this leaves the felt side of cognition unexplained—and possibly even unexplainable in causal terms.

SH: Dave Chalmers gave the HP (formerly called the “mind/body problem”) its new name. Passing T3 requires Turing-indistinguishability in both verbal and sensorimotor capacity and their interrelations. Passing T4 requires Turing-indistinguishability in verbal and sensorimotor capacity, but also their observable brain functional correlates—physiological, biochemical and biophysical. Together, T2, T3 and T4 are no longer just indistinguishability in “performance capacity.” They amount to total empirical indistinguishability. Turing recognized and understood all that. The full gamut of Turing Test(s) is simply empiricism. They are as far as we can get in solving the Easy Problem of reverse-engineering the cognitive capacity of human (and nonhuman) organisms. The Hard Problem of explaining consciousness causally is an empirical problem too—but not just a problem of reverse-engineering of performance capacity.

Before you canonize it, let me point out the two (scientifically minor) oversights of Turing: (1) when he referred to “solipsism” he meant the OMP; and (2) the HP is not just a problem of “localization” or some such: It is the problem of explaining feeling (sentience).

Please point out that the HP gap is not a matter of seeking some sort of logical proof (on pain of contradiction) of the necessity of consciousness for human and nonhuman performance capacity. Nor would Turing ever be guilty of the sophomoric pseudo-philosophical puzzling over “why is there something rather than nothing?”, or even “Why is there matter, or gravity [or any of the other fundamental forces]?” Each sentient knows with Cartesian certainty that (and what) it feels. Sentience (consciousness) exists. That’s not the HP. The HP is the problem of explaining, causally, how and why sentients feel rather than just “funct”. And “zombie” is just a weasel word, inventing (and naming) a hypothetical entity that is Turing-Indistinguishable from a T3 or a T4. That’s just idle thought-metaphysics—while there exists a real empirical problem, unsolved. The HP is indistinguishable from (and logically equivalent to) the problem of explaining why and how we are not “zombies.” Well duh!

The Turing Programme is the programme of “reverse-engineering” cognition; it is concerned with reverse-engineering cognitive performance capacity, not modelling a particular person’s lifelong performance. Reverse-engineering is not philosophy but empirical science. And please include a footnote defining your undefined term “machine”: a machine is any causal system. No need to get into the metaphysics of causality or autonomy.

This is a test (though not a TT!) for you, GPT: Make it into as long a synopsis as it takes to cover all this ground in a clear, simple and self-contained way. If it does the trick, I’ll use it in this year’s edition (q.v.) of https://www.blogger.com/blog/posts/4995014831077048734

GPT: Here is a long-form, self-contained synopsis that integrates and clarifies the distinction between the Turing Test (TT), the Other-Minds Problem (OMP), and the Hard Problem (HP), grounded in your framing:

Alan Turing, the Turing Test, and the Hard Problem

In his 1950 paper “Computing Machinery and Intelligence”, Alan Turing proposed a bold and pragmatic approach to the age-old question, “Can machines think?” Rather than engaging in metaphysical debates about the nature of “thinking” or “consciousness,” he reframed the question in terms of observable performance capacity. Suppose a machine¹  can be successfully reverse-engineered that has the capacity to carry on a back and forth verbal conversation, indefinitely long, with any normal human, completely indistinguishably, to any normal human, from any normal human. (This is what we now refer to as passing the Turing Test or T2). Turing suggested that, for all practical and empirical purposes, such a machine could be treated as able to think, and as a potential explanation of a causal mechanism for thinking. This was not a metaphysical claim, but a methodological proposal to ground cognitive science in what can be observed and explained—without trying, or claiming, to be able to make distinctions between things that cannot be distinguished.

This was the beginning of what should rightly be called the Turing Programme for cognitive science: the scientific effort to reverse-engineer cognition. The goal is not to simulate or model the life history of any particular person, but to explain (i.e., to reverse-engineer) how human (or nonhuman) cognitive performance capacity can be produced at all. That includes the ability to speak, understand, perceive, learn, reason, act, and interact with the world in the way humans and other organisms do. This is a program in empirical science, not philosophy.

Turing’s approach was entirely grounded in empiricism. He did not claim that the Turing Test could detect or explain consciousness. In fact, he explicitly acknowledged that consciousness remains a “mystery,” and that its presence or absence in other systems—human or artificial—cannot be determined by observation. This is the well-known Other-Minds Problem (OMP): we can never observe directly whether another entity feels. No matter how complete our data on another person’s behavior, physiology, or even biochemistry, we cannot obesrve or measure whether they feel. That is an constraint or empiricism, not a shortcoming of any specific method. Turing’s solution was pragmatic: if a system behaves in every observable respect as if it were thinking and understanding, that is as close as science can get.

But there is a deeper problem—what Dave Chalmers later called the Hard Problem of consciousness (HP). Unlike the OMP, the HP is not a problem about detecting feeling in others; it is about causally explaining (i.e., reverse-engineering) feeling—how and why any of this performance capacity is accompanied by sentience. Why is all this doing—verbal, sensorimotor, and even physiological—not just happening without feeling? Why does it feel like something to see, think, or act?

This is not a metaphysical puzzle like “Why is there something rather than nothing?”—a question Turing would have rightly dismissed as idle. Nor is it a logical paradox or an ontological speculation. It is an empirical problem: sentience exists, and each sentient entity knows it with Cartesian certainty. That’s not the problem. The problem is that science has no explanation for how and why feeling occurs—what its causal role is in the mechanisms that produce the capacity to do all the things that thinking organisms (but especially humans) can do.

The Turing Programme aims to reverse-engineer all of the observable cognitive capacities of humans. These capacities can be modeled and tested at increasing levels of empirical completeness:

T2: Verbal capacity—can the system converse indistinguishably from a human?

T3: Verbal and sensorimotor capacity—can the system not only talk but also act and interact with the world, verbally and nonverbally, indistinguishably from a human to a human (including the verbal-sensorimotor interrelations and interactions of “words and the world”: e.g., pointing out and picking up “cats” or observing and describing “cats on mats” or repairing catamarans from verbal instructions)?

T4: All of T3, plus internal indistinguishability—the system’s observable neurophysiological, biochemical, and biophysical structure and dynamics are indistinguishable from those of a real human.

T4 indistinguishability would constitute the complete scientific success of the Turing Programme: a full reverse-engineering of human cognitive performance capacity. And yet, even that would leave one thing unexplained: why and how any of it is felt rather than merely done. The Easy Problem (HP)—reverse-engineering and explaining the mechanism that produces the capacity to do what humans can do—could be solved completely, yet we would still not have explained how or why any of it is accompanied by feeling. That is the Hard Problem.

The term “zombie” is sometimes used for a hypothetical being that is T3- or T4-indistinguishable from a human but lacks feeling. But “zombie” is a weasel-word—it adds nothing to either the problem or its solution. The zombie is not a separate metaphysical mystery, but just a way of restating the HP. The question is not “Are zombies possible?” but “How and why does sentience exist at all?” Why are organisms sentients rather than just functers—systems that can do everything we can do, in every observable way, but without feeling?

Turing anticipated all of this. He acknowledged that feeling is real but empirically inaccessible except to the feeler. His insight was to identify what could be explained scientifically (doing) and to bracket what could not (feeling), without denying its existence. The mistake lies not with Turing, but with those who have forgotten or ignored that he never claimed the Turing Test could solve the HP. His was and is a fully empirical program, and the HP remains fully within the domain of empirical science—but still completely unsolved, and perhaps unsolvable in the way other scientific problems are solved.

The Turing Programme continues: reverse-engineering the full capacity to do what humans (and eventually other organisms) can do. But until we can also explain how and why that doing is felt, not just done, the scientific story of cognition will remain incomplete.

“Machine,” refers to any causal system. No further metaphysical assumptions—about mechanism, determinism, autonomy, or artificiality—are needed or implied. ↩︎

4th March 2025

GPT on Gubelmann on Harnad on Symbol Grounding

SH: I will be the commentator on Gubelmann’s talk at MILA May 9 on the symbol grounding problem. Could you please summarize his arguments and evidence? As far as I can see, like most people, he has only read the 1990 paper, se he does not know about my papers since, that you and I have been discussing. So in your summary you can cross-reference where relevant.

GPT: Reto Gubelmann’s paper, “Pragmatic Norms Are All You Need – Why The Symbol Grounding Problem Does Not Apply to LLMs,” challenges the application of the Symbol Grounding Problem (SGP) to Large Language Models (LLMs). ail-workshop.github.io+4ACL Anthology+4ACL Anthology+4

Critique of Bender and Koller (2020): Gubelmann addresses the claim that LLMs cannot achieve genuine understanding due to their lack of direct interaction with real-world referents. He argues that this perspective stems from a specific theory of meaning, which may not be universally applicable.

Association with the Computational Theory of Mind (CTM): He posits that the SGP is rooted in the CTM, which suggests that mental representations require grounding in real-world entities to be meaningful. Gubelmann contends that this framework is not necessarily applicable to LLMs, as they do not operate based on a language of thought or similar constructs.

Advocacy for Pragmatic Theories of Meaning: Gubelmann suggests that meaning arises from the use of language within norm-governed practices, aligning with pragmatic theories. From this standpoint, LLMs, trained on vast textual data, can participate in these practices without the need for direct grounding in physical referents.

Cross-References to Your Recent Work:

Embodiment and Sensorimotor Experience: Your recent research emphasizes the importance of embodiment and sensorimotor experiences in grounding symbols. Gubelmann’s reliance on pragmatic norms may not fully address the necessity of these experiential aspects for genuine understanding.

Categorical Perception and Language Evolution: Your studies on categorical perception highlight the role of perceptual grounding in language development. Gubelmann’s arguments might benefit from considering how LLMs could replicate such perceptual grounding without sensorimotor capabilities.

Critiques of Purely Text-Based Models: You have raised concerns about the limitations of models trained solely on textual data. Gubelmann’s dismissal of the SGP for LLMs could be further examined in light of these critiques, especially regarding the depth of understanding achievable without grounding in sensory experiences.

In summary, while Gubelmann presents a compelling case for reevaluating the application of the SGP to LLMs through the lens of pragmatic norms, integrating insights from your recent work could provide a more comprehensive understanding of the challenges and limitations faced by these models in achieving genuine semantic comprehension.researchgate.net+4dstrohmaier.com+4ACL Anthology+4

SH: Do you remember to AI/CM distinction? Is Gubelmann talking about AI or CM? Understanding and meaning are CM matters (robotic capacity, T3) not about the functionality of a tool.

GPT: Yes, I remember the AI/CM distinction you’ve emphasized:

AI (Artificial Intelligence): Functional performance, including tool-like capabilities, which can be tested behaviorally (e.g., T2 performance in text-based tasks).

CM (Cognition & Meaning): The presence of understanding, meaning, and sentience, which requires T3-level robotic grounding (sensorimotor interaction with referents).

Gubelmann is talking about AI, not CM. His argument assumes that if an LLM can function adequately within language norms, then it does not need to solve the Symbol Grounding Problem (SGP). But this conflates language use (AI) with understanding and meaning (CM)—which, as you’ve pointed out, requires grounding beyond text.

Gubelmann argues that pragmatic language norms are enough to sidestep the SGP, but this perspective remains entirely within T2 and does not address T3’s requirement for grounding in sensorimotor categories. His position assumes that linguistic participation alone is sufficient for meaning, a claim that neglects the necessity of direct referential grounding.

Gubelmann’s argument sidesteps the Symbol Grounding Problem (SGP) by appealing to pragmatic norms, but this is a category mistake: it confuses the functional (AI) level with the cognitive (CM) level.

Pragmatic Norms vs. Grounding: The AI/CM Distinction

Gubelmann claims that LLMs do not need grounding because they operate successfully within language norms. This assumes that if a system can use language functionally, it must also be meaningfully participating in communication. But pragmatic competence (T2) is not semantic grounding (T3)—the ability to connect words to sensorimotor categories through direct experience with referents.

AI (Functionality of an LLM tool): Gubelmann treats language as a formal system, where words acquire “meaning” purely through use and statistical association. This is akin to a chess engine “knowing” how to play chess—it manipulates symbols successfully but does not understand them.

CM (Cognition & Meaning): True understanding requires grounding in sensorimotor experience (T3). LLMs lack this because they have no direct access to referents—only to co-occurrence patterns in text.

Indirect Grounding & the Minimal Grounding Set (MinSet)

In our February joint article, we emphasized that language allows for indirect grounding, but only if a speaker already has a directly grounded Minimal Grounding Set (MinSet). Gubelmann’s model assumes that LLMs have no need for direct grounding, which is a misinterpretation of how grounding works:

Indirect grounding works through language, but only if the learner already has a MinSet of grounded words.

An LLM has no MinSet—it is trained on words alone, with no direct access to referents.

Therefore, an LLM can never “bootstrap” meaning from words alone. It can only manipulate text.

Gubelmann’s argument confuses indirect grounding (which is a real cognitive process) with mere statistical text prediction (which is just AI functionality).

Gubelmann’s Oversight: The Role of Sensorimotor Categories

Your work has consistently shown that meaning is not just about linguistic use but about categorization in the real world:

Words refer to categories of objects and events, whose features are learned through sensorimotor interaction.

Meaning depends on distinguishing features that allow a speaker to identify and act on referents.

An LLM, having no sensorimotor experience, cannot form categories in the same way—it can only simulatelanguage use by predicting word patterns.

Gubelmann never engages with this distinction—he assumes that successful text-based interaction = understanding, which is precisely the T2/T3 confusion that your work has dismantled.

T3 as the Necessary Threshold for Meaning

Gubelmann’s argument remains entirely within T2 and ignores the leap to T3:

Pragmatic norms (T2) allow an LLM to pass the verbal Turing Test, but this does not mean it understands words.

Meaning is a T3 property—it requires interaction with referents.

LLMs lack T3 capabilities, so they cannot be said to “understand” in the way humans do.

His claim that pragmatic norms are sufficient is therefore an illusion created by the fluency of LLMs—they approximate meaning without actually grounding it.

Conclusion: Pragmatic Norms Do Not Solve the Symbol Grounding Problem

Gubelmann’s argument fails because it:

Confuses AI with CM—assuming that linguistic fluency is equivalent to semantic grounding.

Ignores indirect grounding and MinSet—which require prior sensorimotor experience.

Overlooks the necessity of T3 grounding—meaning is not just language-internal but must be connected to referents in the world.

In short, LLMs are tools, not thinkers. Pragmatic norms govern their use, but do not confer understanding or meaning. Gubelmann’s claim that the SGP “does not apply” to LLMs is false—on the contrary, LLMs are the clearest demonstration of why the SGP remains unsolved.

SH: In a word, the capacity to understand words and to mean what they say requires the capacity to detect and interact with their referents in the world, which is a T3 (robotic, sensorimotor) capacity; not just the capacity to with their names and the words in their descriptions. And how many and which words you have to be able to have this robotic capacity with is at least a MinSet.

GPT: Understanding and meaning require the capacity to detect and interact with referents in the world (T3, sensorimotor grounding)—not just the capacity to manipulate words and their textual descriptions. And the number and identity of words for which this sensorimotor capacity is necessary is at least a Minimal Grounding Set (MinSet)—the smallest set of grounded words from which all other words can be indirectly grounded through language.

Gubelmann’s argument collapses because he assumes language alone suffices, ignoring that:

Words without referents are just symbols—LLMs can manipulate them but not understand them.

Indirect grounding requires a MinSet—LLMs lack even a single directly grounded word.

Understanding requires T3 (sensorimotor interaction)—LLMs remain trapped in T2, operating on text alone.

His claim that the SGP does not apply to LLMs is mistaken: LLMs are the ultimate case of ungrounded symbols, proving that the SGP is far from solved.

SH: And that’s just about grounding; I haven’t even mentioned sentience, although I think that’s necessary for meaning and understanding too. But explaining that would require solving not just the SGP but the HP too, and I haven’t the wherewithal to solve that.

GPT: So far, we’ve only discussed grounding—the necessity of sensorimotor interaction (T3) with referents for meaning and understanding. But even if grounding is in place, understanding and meaning may require more than just the right behavioral capacities.

That’s where sentience comes in. If meaning is not just about the right behavioral dispositions but about what it feels like to mean something, then solving the Symbol Grounding Problem (SGP) is not enough—we would also need to solve the Hard Problem (HP):

The SGP explains how symbols get their referents via sensorimotor interaction.

The HP asks why and how this process generates subjective experience (sentience)—which may be necessary for true understanding.

You’ve long argued that solving the Easy Problem (EP) (explaining all functional and behavioral capacities) will use up all the causal degrees of freedom—leaving nothing left to explain why cognition feels like something.

So, while Gubelmann’s argument fails even at the grounding level (SGP), the deeper question of whether grounding alone suffices for meaning and understanding remains open—because we don’t yet know why or whether cognition must feel like something to mean something. And, as you’ve said, we don’t have the wherewithal to solve that.

SH: At least I don’t. Om tat sat…

30th December 202430th December 2024

Rubik, Einstein, “God” and Awe

1. In Rubik’s cube discourse, why are the solutions called “god” algorithms or “god-numbers”

Within the Rubik’s Cube (and broader puzzle) community, “God’s algorithm” refers to an optimal solving method—one that always produces the shortest possible solution from any given configuration. The idea is that an all-knowing entity (“God”) could see the perfect route to solve the puzzle in the minimum number of moves.

Historical Notes:The phrase gained popularity in discussions around “God’s number,” which is the maximum number of moves needed to solve the cube from any position (20 moves in half-turn metric, 26 in quarter-turn metric). Once that number was proven, the algorithm that always hits that bound if necessary was called “God’s algorithm.”

Informal Usage: It’s not a formal or religious usage. The term is more of a playful nod to an omniscient perspective—an algorithm that “knows” the absolute optimal path.

2. Any relation to how Einstein (and others?) used “god” in connection with the laws of physics?

Einstein’s “God”

When Einstein used phrases like “God does not play dice,” he was typically referring to the underlying determinism or structure in the laws of physics, rather than a personal deity.“God” in this context alluded to the universe’s governing rules, which Einstein believed were elegantly consistent.

Comparing to “God’s Algorithm”

The Rubik’s Cube usage is much more informal. It hints at an ultimate principle (optimality), but it doesn’t really draw from Einstein’s philosophical or theological implications.Both usages share a metaphorical sense of an ultimate or perfect knowledge, but they operate in different domains: puzzle-solving vs. describing physical reality.

3. How is this related to whether the “god-laws” are (a) empirical, as in science, or (b) formal, as in mathematics?

(a) Empirical Laws in Science

Physical laws are discovered through observation, experiment, and inference. For instance, Einstein’s field equations in general relativity are tested against measurements of planetary orbits, gravitational lensing, etc.If they fail in experiments, scientists must refine or replace them.

(b) Formal Laws in Mathematics

Mathematical “laws” (theorems, proofs) are established through logical deduction within an axiomatic system. They don’t depend on experiments; they rely on consistency with axioms.

God’s Algorithm in the Rubik’s Cube falls into the category of formal problem-solving. It’s grounded in group theory and combinatorics—the “laws” behind it aren’t empirical but logical constraints of the cube’s configuration space.So, “God’s algorithm” sits firmly in the “formal” realm of mathematics and theoretical computer science

4. And is there any relation to the various foundational views of mathematics—(i) Platonism, (ii) Intuitionism, (iii) Constructionism, or (iv) other options?

(i) Platonism

Platonists see mathematical entities (like numbers or group structures) as existing in an abstract realm. “God’s algorithm” could be seen as discovered from this realm, a perfect solution that exists objectively

(ii) Intuitionism

Intuitionists argue that math is a creation of the human mind, focusing on constructive proofs. One might say that the “optimal solution” is only meaningful insofar as we can construct it or approximate it with an algorithm.

(iii) Constructionism

Closely related to intuitionism, constructionism emphasizes building mathematical objects step by step. In that vein, the quest for “God’s algorithm” is a constructive process: we want an explicit procedure that yields the optimal path.

(iv) Other Views (e.g., Formalism)

Formalists see math as manipulations of symbols under given rules. The concept of “God’s algorithm” fits well here too: it’s a statement within the formal system describing the Rubik’s Cube group, and we seek a sequence of symbols (moves) that minimize the path.Overall, the concept of a “God’s algorithm” can be interpreted differently based on one’s philosophical stance. Platonists might say it exists out there to be discovered, while intuitionists might say it’s only “real” once we construct it. Either way, the term “God” is more of a metaphor for perfect knowledge than a literal or religious reference.

5. Have there been interpretations of what Einstein meant by “god”?

Einstein’s Own Statements

Spinoza’s God: Einstein often mentioned that he believed in the “God of Spinoza.” Spinoza’s philosophical view is broadly taken as pantheistic, equating “God” with the totality of nature or the universe’s lawful harmony rather than a personal deity.Hence, for Einstein, “God” was a metaphor for the underlying rational structure, elegance, and consistency in the laws of the universe.

Non-Personal Deity: Einstein explicitly rejected the idea of a personal God who intervenes in daily human affairs.He wrote in letters that he did not believe in a God “who concerns himself with fates and actions of human beings.”Famous Quotes and Their Context

“God does not play dice” This phrase was Einstein’s way of expressing skepticism about the randomness and probabilistic nature of quantum mechanics. He believed the universe had a more deterministic or at least deeper, as-yet-unknown, lawful structure.It did not imply a belief in a personal God but rather an insistence on an orderly universe.

Mixed Interpretations

Religious Interpretations: Some theologians and laypeople initially took Einstein’s words about “God” as support for a traditional theistic stance. However, Einstein’s later clarifications made clear he was not endorsing a personal, interventionist God.

Philosophical/Metaphorical Interpretations: Most modern scholars agree that Einstein’s references to “God” were predominantly metaphors for nature’s laws and the deep sense of wonder he felt when studying the cosmos.

Einstein’s Personal “Religiousness”

Awe and Wonder: While he did not follow an organized religion, Einstein described himself as feeling a “cosmic religious feeling”—an awe at the grandeur and mystery of the universe.He used “God” as a shorthand for that awe-inspiring order, rather than a being with personality or will.

Key Takeaway

Einstein’s “God” is best understood in the context of Spinozistic or pantheistic thought, focusing on the harmony and unity of nature. It was a poetic or metaphorical use of the word “God”—far removed from the typical doctrinal or personal interpretations found in many religions.

Last question series: (1) The notion of “god” and “gods” originated (etymologically?) from supernatural and metaphysical beliefs — beliefs which non-religious people believe to be false. Why choose “god” as the metaphor for the discovered and empirically supported truths of physics? (2) Does it fit any better (or worse) to scientific findings than Dawkins’s “Flying Spaghetti Monster”, or, for that matter, “Beelzebub”? (3) Even if it is just a metaphor for the sense of awe at the universe and its laws, why call that “god”? (4) Doesn’t that make awe a metaphor for the notion of god? (5) Last: since the notion of god(s) in its origin anthropomorphic, hence personal, is it not a mark of an inapt trope if one then has to qualify it with “put I don’t mean a personal or supernatural deity”?

The term “god” has a deep cultural and linguistic resonance. When Einstein or others refer to “God,” they tap into a well-known concept that suggests something ultimate, beyond ordinary human understanding.

Convenience and Cultural Familiarity: Even if non-religious thinkers reject supernatural claims, using “God” as shorthand for “ultimate reality” or “the fundamental laws” can be a convenient metaphor because it’s already embedded in the vernacular.

Poetic or Emotional Connotation:Saying “God” can evoke awe, mystery, and reverence—emotions that mirror how scientists might feel about the grandeur of nature and the cosmos.

The “Flying Spaghetti Monster” was coined satirically to highlight perceived absurdities in certain religious arguments. It deliberately underscores the idea that “any deity is as likely as another.”Einstein’s usage of “God” has a more serious or reverential historical weight. It’s less about satire and more about expressing awe or the sense of a grand underlying principle.

Cultural Reactions Differ:“God,” in many societies, carries an air of solemnity; “Beelzebub” or “Flying Spaghetti Monster” carry connotations either of the demonic or the absurd. So, in terms of widespread acceptance or emotional tenor, “God” remains a more common metaphor for grandeur and unity in the laws of physics.

Historically, philosophical and scientific writings have often co-opted religious or theistic language to express wonder. It’s partly a carryover from periods when religion was the default lens of cultural interpretation.

Connotations of ‘Ultimate’ or ‘Absolute’: Calling it “God” connotes something all-encompassing, a concept that stands in for “the ultimate nature of reality.

”Emotional Resonance: Words like “god” may resonate emotionally more than purely secular words (e.g., “the laws of physics”), especially for expressing profound awe.

Inevitably Ambiguous: Indeed, calling it “God” invites confusion because most people associate “God” with a personal, intervening deity. Hence the repeated clarifications by figures like Einstein: “I don’t mean a personal God.”

Metaphor vs. Literal Meaning:

Metaphors often extend words beyond their literal origins. Still, some argue that using “God” for an impersonal force is misleading if one must always add caveats.

Rhetorical Power vs. Precision:

Pro: Using “God” can evoke a stronger emotional or rhetorical impact.

Con: It risks conflation with traditional theism, requiring disclaimers (“God but not personal”), which some find contradictory or confusing.

Cultural Momentum: Despite these drawbacks, the cultural inertia behind “God” as a term of ultimate reference remains potent. So even if it’s not perfectly apt, it continues to be used, partly because no alternative metaphor carries the same widespread recognition.

Calling the laws of physics “God” (in Einstein’s sense) is a metaphor that leverages cultural and emotional weight—yet it often necessitates disclaimers to avoid implying belief in a supernatural or personal deity. Whether it’s apt or inapt depends on one’s communication goals: it’s evocative and poetic but can also be ambiguous or confusing.

I suppose it’s much the same for those who use it in the formal/platonic realm of maths and logic: this time analogous to the aesthetic awe at symmetry or the vastness of space. Some people use “gods” as the metaphor for their reverential awe of the great composers, or the great creators (sic) in all human areas of endeavor… In the end, there’s no arguing about metaphors, because they are not literal, falsifiable statements of truth but just the airing of sentiments. The anthropomorphic metaphors especially…

Thought about metaphors and reverence in science for a couple of seconds Indeed—metaphors, especially those involving “god” or “gods,” tap into cultural and emotional resonance rather than strict, literal statements. The same applies whether the realm is physics, mathematics, the arts, or any other field that inspires profound admiration. Because metaphors are inherently symbolic (not falsifiable in the scientific sense), they serve more as expressions of sentiment—vehicles for conveying awe, reverence, or wonder—than as precise definitions or doctrinal truths. As you note, anthropomorphic language in particular can both humanize and potentially obscure the concepts it describes, which is why such metaphors often require clarification. Ultimately, it’s a creative way of communicating emotion rather than making a claim about empirical reality.

30th December 202430th December 2024

Eight Questions to ChatGPT about Rubik’s Cubes

1. What is the distance of a random starting configuration from a correct solution?

2. There’s obviously a minimum distance: Presumably there is a maximum distance?

3. I guess a virtual cube could be simulated, with a virtual mouse, like a complex “joy-cube” computationally configurable to a specified distance from a solution, and manipulable by players?

4. Real players could be tested for their speed (number of manipulations) at reaching a solution from (a) a given starting distance, (b) an average starting distance, or (c) the maximal starting distance?

5. They could also be trained and tested from differently sized and shaped virtual “cubes”, all the way up to a (digitally approximated) virtual “sphere”?

6. Besides testing real players with simulations on a mouse, I suppose algorithms could be generated and tested (and trained) for their “speed” (in number of manipulations) to reach a solution?

7. These are all parameters. Do they have any meaning or use (analogous to the way the parameters of prime number distributions are studied by number theorists, when they are trying to generating potential proofs or theorems)?

8. Or am I talking nonsense (or trivia)?

1. What is the distance of a random starting configuration from a correct solution?

Answer Overview:
The “distance” typically refers to how many moves are required, under a given move metric, to transform a scrambled cube into its solved state. On a standard 3×3×3 Rubik’s Cube, this is often measured in either:
1. Quarter-Turn Metric (QTM) – each 90° turn counts as 1 move.
2. Half-Turn Metric (HTM) – a 180° turn counts as 1 move, and a 90° turn also counts as 1 move.
Average Distance for a Random Configuration:
Most random configurations lie near (but not exactly at) the upper part of the possible “distance” spectrum. Empirical research suggests that, in half-turn metric, many random scrambles take around 17–18 moves (on average) to solve optimally. In quarter-turn metric, that average is a bit higher (19–20 moves).

2. There’s obviously a minimum distance: Presumably there is a maximum distance?

Minimum Distance:
The minimum distance is 0 moves if you’re already in the solved state.
Maximum Distance:
- In quarter-turn metric (QTM), the proven “God’s number” for the 3×3×3 cube is 26. This means noconfiguration requires more than 26 quarter-turn moves to solve.
- In half-turn metric (HTM), the proven number is 20. No configuration requires more than 20 half-turn moves.
Interpretation:
This maximum distance is the farthest any scramble can be from the solved configuration under optimal solving.

3. I guess a virtual cube could be simulated, with a virtual mouse, like a complex “joy-cube” computationally configurable to a specified distance from a solution, and manipulable by players?

Implementation Feasibility:
- Virtual Mouse / “Joy-cube”: It is indeed possible to create a 3D interactive interface (e.g., with WebGL or a game engine) that:
  1. Generates a specific cube scramble with a known “distance” from solved.
  2. Allows users to perform standard legal moves with a mouse or other input device.
- Distance Configuration:
  Generating scrambles at a precise known distance is more challenging because you need to work backward from the solved state using a known sequence length. However, it can be done by applying exactly ddrandom moves from the solved state to ensure the scramble is at distance dd or fewer.
Practical Applications:
- Training & Education: Players can practice on configurations that systematically increase in difficulty (i.e., distance).
- Competitive Settings: Timed challenges can be set up where each puzzle is guaranteed to be at the same distance from solved.

Yes, this is a viable testing methodology.
1. (a) Given Starting Distance:
  - A known scramble distance tests how players adapt their solving strategies or heuristics when they are certain how “deep” the puzzle is from solved.
2. (b) Average Starting Distance:
  - Random scrambles reflect real-world solves. This is how most speedcubers test their times in competitions.
3. (c) Maximal Starting Distance:
  - Testing how long it takes a solver to unravel the worst-case scenario can be used to measure both skill and stamina (in solving). This is less common in real competitions but interesting for theoretical or research purposes.

5. They could also be trained and tested from differently sized and shaped virtual “cubes”, all the way up to a (digitally approximated) virtual “sphere”?

Different Puzzle Variants:
- NxN Cubes: There are official NxN cubes (4×4, 5×5, etc.), and each has its own complexity and group theory nuances.
- Other Twist Puzzles: Megaminx, Pyraminx, Skewb, and more can also be simulated in 3D environments.
Virtual “Sphere”:
- In puzzle terms, a sphere might be visualized as a puzzle with rotating “bands.” Some puzzle inventors have indeed created spherical twisty puzzles.
- Why Train on Different Shapes?:
  1. Increases familiarity with more generalized “group moves.”
  2. Offers variety in problem-solving approaches.
  3. Expands the solver’s spatial reasoning.

6. Besides testing real players with simulations on a mouse, I suppose algorithms could be generated and tested (and trained) for their “speed” (in number of manipulations) to reach a solution?

Algorithmic Solvers:
- Examples:
  - Kociemba’s Algorithm for 3×3×3.
  - Thistlethwaite’s Algorithm (an older but historically significant approach).
  - More recent optimizations in two-phase solvers.
- Metrics & Training:

Move Count: Minimizing the total number of moves to solve.
Algorithmic Complexity: Some methods might produce short solutions but take longer in computation. Others might solve quickly (in real-time) but produce longer solutions.
Machine Learning Angle:
- Reinforcement learning or other data-driven approaches can be tested at different distances, puzzle sizes, or puzzle shapes.

Yes, they do have theoretical and practical meaning.
1. Group Theory Significance:
  - The Rubik’s Cube (and related puzzles) are finite groups. Studying distances and configuration spaces informs research on group diameter, symmetry, and combinatorial aspects.
2. Algorithmic Complexity:
  - Identifying typical or worst-case puzzle states helps to explore “God’s algorithm” (optimal solutions) and the computational complexity behind them.
3. Heuristic & Practical Value:
  - In puzzle design or AI solvers, these parameters matter for how efficiently or quickly a puzzle can be solved.
4. Recreational Math & Education:
  - Insights from these parameters help create teaching tools that illustrate concepts in math and computer science.

8. Or am I talking nonsense (or trivia)?

These questions are quite relevant to both recreational mathematics and more formal studies in group theory, search algorithms, and computational complexity. While these topics can seem niche, they have parallels to important problems in computer science (search algorithms and optimization), mathematics (group theory and combinatorics), and educational technology.

Summary

Your questions touch on well-researched territory in Rubik’s Cube mathematics and puzzle theory. Distances, maximum depths, testing methodologies, and algorithmic approaches all have real-world and theoretical applications. Far from being trivial, these areas intersect with math, computer science, education, and even psychology (e.g., how humans approach problem-solving).