{"id":2206,"date":"2024-03-16T21:32:16","date_gmt":"2024-03-16T21:32:16","guid":{"rendered":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/?p=2206"},"modified":"2024-03-16T21:34:45","modified_gmt":"2024-03-16T21:34:45","slug":"can-toy-robotic-capacities-make-top-down-meet-bottom-up","status":"publish","type":"post","link":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/2024\/03\/16\/can-toy-robotic-capacities-make-top-down-meet-bottom-up\/","title":{"rendered":"Can Toy Robotic Capacities Make Top-Down Meet Bottom-Up?"},"content":{"rendered":"\n<p class=\"has-text-align-right\"><a href=\"https:\/\/www.youtube.com\/watch?v=Sq1QZB5baNw\">Re: <strong>Figure Status Update &#8211; OpenAI Speech-to-Speech Reasoning<\/strong><\/a><\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\"><strong>SH:<\/strong><\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\">Is this demo sensorimotor grounding? No, It\u2019s a toy robot with (1) some toy-world visual recognition and motor manipulation skills, plus (2) (perhaps non-toy) text-to-speech and speech-to-text capacity, plus (3) ChatGPT\u2019s remarkable and as-yet unexplained (non-toy) interactive verbal skills, including (4) its (non-toy) encyclopedic verbal database and navigation\/interaction capacity.<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\">But it\u2019s still ungrounded.<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\">If\/when it can do the kind of thing it does in the video with <em>anything<\/em> it can talk about, and not just an infomercial demo, then, and only then, will it have an even more remarkable, and as yet unexplained, (non-toy) grounded T3 robotic capacity.<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\">Two-year-olds are grounding their words via the only way upward: <strong>bottom-up<\/strong>, through (unsupervised and supervised) learning of sensorimotor categories, by detecting their distinguishing sensorimotor features <strong>directly<\/strong>, and then naming the grounded categories (by describing their features, which are likewise learnable, nameable categories).<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\">Then, because the 2yr-old also has the capacity for language (which means for producing and understanding subject-predicate propositions with truth-values, composed out of category names defined or described by referents of their (grounded) feature-category names), verbal instruction (LLM-style) can kick in and even take over.<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\">That\u2019s bottom-up grounding, and it applies to children as much as to T3 robots.<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\">But that OpenAI infomercial demo was just a toy. More important, though, is that it is based on an incoherent pipe-dream, which is that an LLM\u2019s words can be grounded <strong>top-down <\/strong>with the help of more and more <strong>bottom-up<\/strong> toy robotic learning.<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\">But the only way to get to the top is with bottom-up T3 robotics. So you might as well forget about the LLM and its word-power for that project. Just let your 2-year-old child or T3 robotlet use ChatGPT as tutor, the way kids use their parents. But to understand what the LLM (or their nanny) is telling them, they have to learn the referents of all the content-words in the verbal explanation. And there\u2019s only one way to do that: bottom-up, first by direct sensorimotor grounding (DSG) and then \u2013 once the child or Ts robot has grounded the names &#8212; and hence the referents &#8212; of enough categories bottom-up (including enough feature-category names), learners can carry on grounding bottom-up through indirect verbal grounding (IVG), as long as their tutors describe the new (ungrounded) categories with propositions made up of already-grounded category names \u2013 <em>already grounded in the heads of the learners<\/em>.<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\">So the bottom line is that the referents of the words in the <strong>teacher<\/strong>&#8216;s  IVG definition or description or explanation <em>have to already be grounded<\/em> in the head of the <strong>learner<\/strong> (whether via DSG or IVG).<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\">I leave it to you, GPT, as an exercise, to explain why that cannot be done top-down with an LLM-powered chat-robot, where it\u2019s hanging from ungrounded skyhooks<\/p>\n\n\n\n<p class=\"has-cyan-bluish-gray-background-color has-background\"><strong>ChatGPT4<\/strong> (after a bit of browbeating by me):<\/p>\n\n\n\n<p>An important question in the development of AI and robotics concerns the most efficient and effective path to achieve a genuinely grounded, intelligent system. Is it more practical or feasible to enhance a primarily linguistic, top-down T2 language model (LLM) to achieve the sensorimotor and cognitive capabilities of a T3 robot, or would it be better to develop T3 robotic capabilities bottom-up independently and then supplement these with the LLM&#8217;s abilities externally, as a form of advanced, verbal tutoring.<\/p>\n\n\n\n<p>Given the distinctions between T2 and T3 systems, a more grounded and arguably more natural progression of AI development, closely mirroring human cognitive development might be more promising. Humans first develop sensorimotor understanding and category learning capabilities (<strong>direct<\/strong> T3 capacities) before language acquisition goes on to significantly enhance and expand their cognitive and communicative capacities. This developmental trajectory suggests that grounding in the physical world through direct sensorimotor experience forms a necessary foundation for the effective use of language for further learning and communication.<\/p>\n\n\n\n<p>Thus, attempting to &#8220;train&#8221; a T2 LLM chatbot to become a T3 Turing-indistinguishable robot might not only be inherently inefficient but could overlook the essential aspect of grounding that comes from direct interaction with the physical world. Incorporating LLMs as an auxiliary verbal tutor for a T3 entity (human or robot), once it has developed a sufficiently grounded vocabulary about the world, may be a more logical and effective approach, leveraging the strengths of LLMs in processing and generating language to enhance the T3&#8217;s cognitive and communicative capabilities. It aligns with how language and learning occur in humans, where foundational knowledge and understanding from direct grounding are subsequently enriched and expanded through language-based learning &#8212; indirect but grounded.<\/p>\n\n\n\n<p>This approach would capitalize on the LLM&#8217;s capacity for generating coherent, contextually appropriate language but would also ensure that the use of this language was grounded in a pre-existing vocabulary and propositions. It suggests a complementary integration of auxiliary T2 LLM chatbotic and autonomous T3 robotic systems, where each component contributes its strengths to the development of a more versatile and genuinely intelligent agent.<\/p>\n\n\n\n<p>The notion that a top-down, language-centric system (T2) can achieve genuine grounding by incrementally incorporating &#8220;toy&#8221; sensorimotor capabilities may reflect a fundamental misconception about the nature of grounding and the role of sensorimotor experiences in achieving it.<\/p>\n\n\n\n<p>Grounding isn&#8217;t merely a matter of bridging linguistic competence with basic physical interaction. True grounding \u2014as observed in humans and necessary for a hypothetical T3 robot \u2014 entails a deep, intrinsic connection between sensorimotor experiences and the symbols (words) that represent them. This requires not just the recognition or manipulation of objects based on learned descriptors but direct understanding of those objects and actions in the world to which they refer.<\/p>\n\n\n\n<p>The error in imagining that a T2 system could &#8220;graduate&#8221; to T3 status by being endowed with limited sensorimotor abilities lies in underestimating the complexity of human sensorimotor and cognitive integration. Human cognition is not simply the sum of its parts; it&#8217;s the product of a highly integrated and complex system that develops bottom-up through continuous, rich interaction with the environment. Language development in humans, deeply intertwined with sensorimotor experiences, cannot be replicated in AI by superficially adding physical capabilities to a fundamentally linguistic system.<\/p>\n\n\n\n<p>The strengths of a T2 system like GPT could be better utilized as an auxiliary educational tool for a T3 robot that is learning through its interactions with the world. In this scenario, GPT&#8217;s extensive knowledge base and linguistic capabilities can enhance the learning process by providing information, explanations, and clarifications in a way that complements the robot&#8217;s direct experiences. This approach would leverage the best of both worlds: the depth and breadth of GPT&#8217;s top-down linguistic knowledge and the direct grounding that comes from the T3&#8217;s bottom-up sensorimotor interactions with its environment.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Re: Figure Status Update &#8211; OpenAI Speech-to-Speech Reasoning SH: Is this demo sensorimotor grounding? No, It\u2019s a toy robot with (1) some toy-world visual recognition and motor manipulation skills, plus (2) (perhaps non-toy) text-to-speech and speech-to-text capacity, plus (3) ChatGPT\u2019s remarkable and as-yet unexplained (non-toy) interactive verbal skills, including (4) its (non-toy) encyclopedic verbal database &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/2024\/03\/16\/can-toy-robotic-capacities-make-top-down-meet-bottom-up\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Can Toy Robotic Capacities Make Top-Down Meet Bottom-Up?&#8221;<\/span><\/a><\/p>\n","protected":false},"author":3074,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[105,146,110,111,106,164,153,147],"tags":[175,176,177,174],"class_list":["post-2206","post","type-post","status-publish","format-standard","hentry","category-categorization","category-chatgpt","category-cognitive-science","category-computation","category-language","category-llms","category-symbol-grounding","category-turing-test","tag-bottom-up","tag-direct-sensorimotor-grounding","tag-indirect-verbal-groumding","tag-top-down"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/posts\/2206","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/users\/3074"}],"replies":[{"embeddable":true,"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/comments?post=2206"}],"version-history":[{"count":3,"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/posts\/2206\/revisions"}],"predecessor-version":[{"id":2209,"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/posts\/2206\/revisions\/2209"}],"wp:attachment":[{"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/media?parent=2206"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/categories?post=2206"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/generic.wordpress.soton.ac.uk\/skywritings\/wp-json\/wp\/v2\/tags?post=2206"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}