To learn to categorize is to learn to do the correct thing with the correct kind of thing. In cognition much (though not all) of learning is learning to categorize.
We share two ways to learn categories with many other biological species: (1) unsupervised learning (which is learning from mere repeated exposure, without any feedback) and (2) supervised (or reinforcement) learning (learning through trial and error, guided by corrective feedback that signals whether we’ve done the correct or incorrect thing).
In our brains are neural networks that can learn to detect and abstract the features that distinguish the members from the nonmembers of a category through trial, error and corrective feedback, so that once our brains have detected and abstracted the distinguishing features, we can do the correct thing with the correct kind of thing.
Unsupervised and supervised learning can be time-consuming and risky, especially if you have to learn to distinguish what is edible from what is toxic, or who is friend from who is foe.
We are the only species that also has a third way of learning categories: (3) language.
Language probably first evolved at least 200,000 years ago from pointing, imitation, miming and other kinds of purposive gestural communication, none of which are language.
Once gesture turned into language — a transformation I will discuss in a moment — it migrated, behaviorally and neurologically, to the much more efficient auditory/vocal medium of speech and hearing.
Gesture is slow and also ineffective in the dark, or at a distance, or when your hands are occupied doing other things, But before that migration to the vocal medium, language itself first had to begin, and there gesturing had a distinct advantage over vocalizing: an advantage that semioticians call “iconicity”: the visual similarity between the gesture and the object or action that the gesture is imitating.
Language is much more than just naming categories ( like “apple”). The shape of words in language is arbitrary; words do not resemble the things they refer to. And language is not isolated words, naming categories. It is strings of words with certain other properties beyond naming categories.
Imitative gestures do resemble the objects and actions that they imitate, but imitation, even with the purpose of communicating something, is not language. The similarity between the gesture and the object or action that the gesture imitates does, however, establish the connection between them. This is the advantage of “iconicity.”
The scope of gestural imitation, which is visual iconicity, is much richer than the scope of acoustic iconicity. Just consider how many more of the objects, actions and events in daily life can be imitated gesturally than can be imitated vocally.
So gesture is a natural place to start . There are gestural theories of the origin of language and vocal theories of the origin of language. I think that gestural origins are far more likely, initially, mainly because of iconicity. But keep in mind the eventual advantages of the vocal medium over the gestural one. Having secured first place because of the rich scope of iconicity in gesture, iconicity can quickly become a burden, slowing and complicating the communication.
Consider gesturing, by pantomime, that you want something to eat. The gesture for eating might be an imitation of putting something into your mouth. But if that gesture becomes a habitual one in your community, used every day, there is no real need for the full-blown icon time after time. It could be abbreviated and simplified, say, by just pointing to your mouth, or just making an upward pointing movement.
These iconic abbreviations or compressions could be shared by all the members of the gesturally communicating community, from extended family, to tribe, because it is to everyone’s advantage to economize on the gestures used over and over to communicate. This shared practice, with the iconicity progressively fading and becoming increasingly arbitrary would keep inheriting the original connection established through full-blown iconicity.
The important thing to note is that this form of communication would still only be pantomime, still just showing, not telling, even when the gestures have become arbitrary. Gestures that have shed their iconicity are still not language. They only become language when the names of categories can be combined into subject/predicate propositions that describe or define a named category’s distinguishing features. That’s what provides the third way of learning categories; the one that is unique to our species. The names of categories (i.e., “open-class-“, or “content-” words like “apple” and apples’ features (which are also named categories, like “red” and “round”) can then be combined and recombined to define or describe further categories, so that someone who already knows the features of a category can tell some one who doesn’t know: “A ‘zebra’ is a horse with stripes.” Mime is showing. Language is telling.
Propositions, unlike imitative gestures, or even arbitrary category-names, have truth values: True or False. This should remind you, though, of the category learning through sensorimotor trial and error we share with other species, under the guidance of positive and negative feedback from the consequences: Correct or Incorrect.
True and False is related to another essential feature of language, which is negation. A proposition can either affirm something or deny something: “It is true that an apple is red and round” or “It is not true that an apple is red and round.” P or not-P.
The trouble with being able to learn categories only directly, through unsupervised and supervised sensorimotor learning, is that it is time-consuming, risky, and not guaranteed to succeed (in time). It is also impoverished: most of the words in our vocabularies and dictionaries are category names (“content-words”); but other than the concrete categories that can be learned from direct sensorimotor trial-and-error-experience (“apple,” “red,” “give,” “take”), most category names cannot be learned without language at all. (All category names, even proper names, refer to abstractions, because they are based on abstracting the features that distinguish them. But consider how we could have learned the even more abstract category of “democracy” or “objectivity” without first learning the words to define or describe their features using the words that refer to them — through unsupervised and supervised learning alone.)
When categories are learned directly through unsupervised learning (from sensorimotor feature-correlations) or supervised/reinforcment learning (from correlations between sensorimotor features and doing the right or wrong thing) the learning consists of the detection and abstraction of the features that distinguish the members from the non-members of the category. To learn to do the correct thing with the correct kind of thing requires learning – implicitly or explicitly, i.e., unconsciously or consciously – to detect and abstract those distinguishing features.
Like nonhuman species, we can and do learn a lot of categories that way; and there are computational models for the mechanism that can accomplish the unsupervised and supervised learning, “deep learning” models. But, in general, nonhuman animals do not name the things they can categorize. Or, if you like, the “names” of those categories are not arbitrary words but the things they learn to do with the members and not to do with the members of other categories. Only humans bother to name their categories. Why?
What is a name? It is a symbol (whether vocal, gestural, or written) whose shape is arbitrary (i.e., it does not resemble the thing it names). Its use is based on a shared convention among speakers: English-speakers all agree to call cats “cats” and dogs “dogs.” Names of categories are “content words”: words that have referents: nouns, adjectives, verbs, adverbs. Almost all words are content words. There exist also a much smaller number of “function words,” which are only syntactic or logical, such as the, if, and, is, when, who, not: They don’t have referents; they just have “uses” — usually defined or definable by a syntactic rule.Â
(Unlike nouns and adjectives, verbs do double duty, both in (1) naming a referent category (just as nouns and adjectives do, for example, “cry”) and in (2) marking predication, which is the essential function of propositions, distinguishing them from just compound content words: “The baby is crying” separates the content word, which has a referent — “crying” — from the predicative function of the copula: “is”. “The crying baby” is not a proposition; it is just a noun phrase, which is like a single word, and has a referent, just as “the baby” does. But the proposition “The baby is crying” does not have a referent: it has a meaning– that the baby is crying – and a truth value (True or False).
It is with content words that the gestural origin of language is important: Because before a category name can become a shared, arbitrary convention of a community of users, it has to get connected to its referent. Iconic communication (mime) is based on the natural connection conveyed by similarity, like the connection between an object and its photo.
(In contrast, pointing – “ostension” — is based on shared directed attention from two viewers. Pointing alone cannot become category-naming as it is dependent on a shared line of gaze, and on what is immediately present at the time (“context”); it survives in language only with “deictic” words like here, this, now, me, which have no referent unless you are “there” too, to sense what’s being pointed at!)
A proposition can be true or false, but pointing and miming cannot, because they are not proposing or predicating anything; just “look!”. Whatever is being pointed at is what is pointed at, and whatever a gesture resembles, it resembles. Resemblance can be more or less exact, but it cannot be true or false; it cannot lie. Flattering portraiture is still far away in these prelinguistic times, but it is an open question whether iconography began before or after language (and speech); so fantasy, too, may have preceded falsity. Copying and depicting is certainly compatible with miming; both static and dynamic media are iconic.)
It is not that pointing and miming, when used for intentional communication, cannot mislead or deceive. There are some examples in nonhuman primate communication of actions done to intentionally deceive (such as de Waal’s famous case of a female chimpanzee who positions her body behind a barrier so the alpha male can only see her upper body while she invites a preferred lower-ranking male to mate with her below the alpha’s line of sight, knowing that the alpha male would attack them both if he saw that they were copulating).
But, in general, in the iconic world of gesture and pointing, seeing is believing and deliberate deception does not seem to have taken root within species. The broken-wing dance of birds, to lure predators away from their young, is deliberate deception, but it is deception between species, and the disposition also has a genetic component.
Unlike snatch-and-run pilfering, which does occur within species, deliberate within-species deceptive communication (not to be confused with unconscious, involuntary deception, such as concealed ovulation or traits generated by “cheater genes”) is rare. Perhaps this is because there is little opportunity or need for deceptive communication within species that are social enough to have extensive social communication at all. (Cheaters are detectable and punishable in tribal settings — and human prelinguistic settings were surely tribal.) Perhaps social communication itself is more likely to develop in a cooperative rather than a competitive or agonistic context. Moreover, the least likely setting for deceptive communication is perhaps also the most likely setting for the emergence of language: within the family or extended family, where cooperative and collaborative interests prevail, including both food-sharing and knowledge-sharing.
(Infants and juveniles of all species learn by observing and imitating their parents and other adults; there seems to be little danger that adults are deliberately trying to fool them into imitating something that is wrong or maladaptive, What would be the advantage to adults in doing that?)
But in the case of linguistic communication – which means propositional communication – it is hard to imagine how it could have gotten off the ground at all unless propositions were initially true, and assumed and intended to be true, by default.
It is not that our species did not soon discover the rewards to be gained by lying! But that only became possible after the capacity and motivation for propositional communication itself had emerged, prevailed and been encoded in the human brain as the strong and unique predisposition it is in our species. Until then there was only pointing and miming, which, being non-propositional, cannot be true or false, even though it can in principle be used to deceive.
So I think the default assumption of truth (veracity)was encoded in the brains of human speakers and hearers as an essential feature of the enormous adaptive advantage (indeed the unparalleled power) of language in transmitting categories without the need for unsupervised or supervised learning, instantly, via “hearsay.” The only prerequisites are that (1) the speaker /teacher must already know the predicated features (and their names) that distinguish the members from the nonmembers of the new category, so they can be conveyed to the hearer in a proposition defining or describing that new category. And (2) that the hearer/learner must already know the content words that refer to the distinguishing features the speaker uses to define or describe the new category. (This is  the origin of the “symbol grounding problem” and its solution).Â
The reason it is much more likely that propositional language emerged first in the gestural modality rather than the vocal one is that gestures’ iconicity (i.e., their similarity to the objects they are imitating) first connected them to their objects (which would eventually become their referents) and thereafter the gestures were free to become less and less iconic as the gestural community – jointly and gradually – simplified them to make communication faster and easier.
How do the speakers or hearers already know the features (all of which are, of course, categories too)? Well, either directly, from having learned them, the old, time-consuming, risky, impoverished, and nonverbal way (through supervised and unsupervised learning from experience) or indirectly, from having learned them verbally (by “hearsay”), through spoken or written propositions from one who already knows the category to one who does not. Needless to say, the human brain, with its genetically encoded propositional capacity, has a default predilection for learning categories by hearsay (and a laziness about testing them out through direct experience).
The consequence is a powerful default tendency to believe what we are told — to assume hearsay to be true. The trait can take the form of credulousness, gullibility, susceptibility to cult indoctrination, or even hypnotic susceptibility. Some of its roots are already there in unsupervised and supervised learning itself, in the form of Pavlovian conditioning as well as operant expectancies based on prior experience.Â
Specific expectations and associations can of course be extinguished by subsequent contrary experience: A diabetic’s hypoglycemic attack can be suppressed by merely tasting sugar, well before it could raise systemic glucose level (or even by just tasting saccharine, which can never raise blood sugar at all). But repeatedly “fooling the system” that way, without following up with enough sugar to restore homeostatic levels, will extinguish this anticipatory reaction.
And, by the same token, we can and do learn to detect and disbelieve chronic liars, eventually. But the generic default assumption, expectation, and anticipatory physiological responses to verbal propositions remain strong with people and propositions in general  – and in extreme cases they can even induce “hysterical” physiological responses, including hypnotic analgesia sufficient to allow surgical intervention without medication. And it can induce placebo effects as surely as it can induce Trumpian conspiracy theories.

