On abstraction, definition, composition and symbol grounding in dictionaries

Re: Blondin Masse, A, G. Chicoisne, Y. Gargouri, S. Harnad, O. Picard, O. Marcotte (2008) How Is Meaning Grounded in Dictionary Definitions? TextGraphs-3 Workshop, 22nd International Conference on Computational Linguistics, Coling 2008, Manchester, 18-22 August, 2008

Many thanks to Peter Turney for his close and thoughtful reading of our paper on extracting the grounding kernel of a dictionary.

Peter raises 3 questions. Let me answer them in order of complexity, from the simplest to the most complex:

— PT: “(1) Is this paper accepted for Coling08?“

Yes. Apparently there are different sectors of the program, and this paper was accepted for the Textgraphs workshop, listed on the workshop webpage.

— PT: “(2) How come we claim the grounding kernel (GK) words are more concrete, whereas in our example, they are more abstract?”

The example was just a contrived one, designed only to illustrate the algorithm. It was not actually taken from a dictionary.

When we do the MRC correlations using the two actual dictionaries (LDOCE and CIDE), reduced to their GK by the algorithm, GK words turn out to be acquired at a younger age, more imagable, and (less consistently) more concrete and more frequent.

However, these are separate pairwise correlations. We have since extended the analysis to a third dictionary, WordNet, and found the same pairwise correlations, but when we put them together in a stepwise hierarchic multiple regression analysis, looking at the independent contributions of each factors, the biggest effect turns out to be age of acquisition (GK being acquired earlier), but then the residual correlation with concreteness reverses polarity: concreteness is positively correlated with earlier age of acquisition across all words in the MRC database, but once the GK correlation with age is partialled out, the remaining GK words tend to be more abstract!

This obviously needs more testing and confirmation, but if reliable, it has a plausible explanation: the GK words that are acquired earlier are more concrete, but the GK also contains a subset of abstract words, either learned later in life, or learned through early abstraction, and these early abstract words are also important for the compositional power of dictionary definitions in reaching other words through definition alone.

The next step would be begin to look at what those GK the GK words, concrete and abstract, actually are, and the extent to which they may tend to be unique and universal across dictionaries.

— PT: “(3) Does our analysis overlook the process of abstraction in its focus on acquiring meaning by composition (through dictionary definition)?”

Quite the contrary. We stress that word meanings must be grounded in prior sensorimotor learning, which is in fact the process of (senssorimotor) abstraction!

Peter writes: “we may understand ‘yellow’ as the abstraction of all of our experiences, verbal and perceptual, with yellow things (bananas, lemons, daffodils, etc.). When we are children, we build a vocabulary of increasingly abstract words through the process of abstraction.”

But we would agree with that completely! The crucial thing to note, however, is that abstraction, at least initially, is sensorimotor, not linguistic. We learn to categorize by abstracting, through trial and error experience and feedback, the invariant sensorimotor features (“affordances”) of the members of a category (e.g., banana, lemons, daffodils, and eventually also yellow), learning to distinguish the members from the nonmembers, based on what they look and feel like, and what we can and cannot do with them. Once we have acquired the category in this instrumental, sensorimotor way, because our brains have abstracted its sensorimotor invariants, then we can attach an arbitrary label to that category — “yellow” — and use it not only to refer to the category, but to define further categories compositionally (including, importantly, the definition through description of their invariants, once those have been named).

This is in agreement with Peter’s further point that “As that abstract vocabulary grows, we then have the words that we need to form compositions.”

And all of this is compatible with finding that although the GK is both acquired earlier and more concrete, overall, than the rest of our vocabulary, it also contains abstract words (possibly early abstract words, or words that are acquired later yet important for the GK).

— PT: “The process of abstraction takes us from concrete (bananas and lemons) to abstract (yellow). The process of composition takes us from abstract (yellow and fruit) to concrete (banana).“

The process of abstraction certainly takes us from concrete to abstract. (That’s what “abstract” means: selecting out some invariant property shared by many variable things.)

The process of “composition” does many things; among them it can define words. But composition can also describe things (including their invariant properties); composition also generates every expression of natural language other than isolated words, as well as every expression of formal languages such as logic, mathematics and computer programming.

A dictionary defines every word, from the most concrete to the most abstract. Being a definition, it is composite. But it can describe the rule for abstracting an invariant too. An extensional definition defines something by listing all (or enough of) its instances; an intentional definition defines something by stating (abstracting) the invariant property shared by all its instances.

— PT: “Dictionary definitions are largely based on composition; only rarely do they use abstraction.“

All definitions are compositional, because they are sentences. We have not taken an inventory (though we eventually will), but I suspect there are many different kinds of definitions, some intensional, some extensional, some defining more concrete things, some defining more abstract things — but all compositional.

— PT: “If these claims are both correct, then it follows that your grounding kernel words will tend to be more abstract than your higher-level words, due to the design of your algorithm. That is, your simple example dictionary is not a rare exception.”

The example dictionary, as I said, was just arbitrarily constructed.

Your first claim, about the directionality of abstraction, is certainly correct. Your second claim that all definitions are compositional is also correct.

Whether the words out of which all other words can be defined are necessarily more abstract than the rest of the words is an empirical hypothesis. Our data do not, in fact, support the hypothesis, because, as I said, the strongest correlate of being in the grounding kernel is being acquired at an earlier age — and that in turn is correlated, in the MRC corpus, with being more concrete. It is only after we partial out the correlation of the grounding kernel with age of acquisition (along with all the covariance that shares with concreteness) that the correlation with concreteness reverses sign. We still have to do the count, but the obvious implication is that the part of the grounding kernel that is correlated with age of acquisition is more concrete, and the part that is independent of age of acquisition is more abstract.

None of this is derived from or inherent in our arbitrary, artificial example, constructed purely to illustrate the algorithm. Nor is any of it necessarily true. It remains to see what the words in the grounding kernel turn out to be, whether they are unique and universal, and which ones are more concrete and which ones are more abstract.

(Nor, by the way, was it necessarily true that the words in the grounding kernel would prove to have been acquired earlier; but if that proves reliable, then it implies that a good number of them are likely to be more concrete.)

–– PT: “As I understand your reply, you are not disagreeing with my claims; instead, you are backing away from your own claim that the grounding kernel words will tend to be more concrete. But it seems to me that this is backing away from having a testable hypothesis.“

Actually, we are not backing away from anything. These results are fairly new. In the original text we reported the direct pairwise correlation between being in the grounding kernel and, respectively, age of acquisition, concreteness, imagability and frequency. All these pairwise correlations turned out to be positive. Since then we have extended the findings to WordNet (likewise all positive) and gone on to do do stepwise hierarchical multiple regression analysis, which reveals that age of acquisition is the strongest correlate, and, when it is partialled out, the sign of the correlation with concreteness reverses for the residual variance.

The hypothesis was that all these correlations would be positive, but we did not anticipate that removing age of acquisition would reverse the sign of the residual correlation. That is a data-driven finding (and we think it is both interesting, and compatible with the grounding hypothesis).

–– PT: “There is an intuitive appeal to the idea that grounding words are concrete words. How do you justify calling your kernel words “grounding” when they are a mix of concrete and abstract? What independent test of “groundingness” do we have, aside from the output of your algorithm?“

The criterion is and has always been: reachability of the rest of the lexicon from the grounding kernel alone. That was why we first chose to analyze the LDOCE and CIDE dictionaries: Because they each allegedly had a “control vocabulary,” out of which all the rest of the words were defined. Unfortunately, neither dictionary proved to be consistent in ensuring that all the other words were defined out of the control vocabulary (including the control vocabulary), so that is why Alexandre Blondin-Massé designed our algorithm.

The definition of symbol grounding preceded these dictionary analyses, and it was not at all a certainty that the “grounding kernel” of the dictionary would turn out to be the words we learn earliest, nor that it would be more concrete or abstract than the rest of the words. That too was an empirical outcome (and much work remains to be done before we know how reliable and general it is, and what the blend of abstract and concrete turns out to be).

I would add that “abstract” is a matter of degree, and no word — not even a proper name — is “non-abstract,” just more or less abstract. In naming objects, events, actions, states and properties, we necessarily abstract from the particular instances — in time and space and properties and experience — that make (for example) all bananas “bananas” and all lemons “lemons.” The same is true of what makes all yellows “yellows,” except that (inasmuch as vocabulary is hierarchical — which it is not, entirely), “yellows” are more abstract than “bananas” (so are “fruit,” and so are “colors”).

(There are not still unresolved methodological and conceptual issues about how to sort words for degree of abstractness. Like others, we rely on human judgments, but what are those judgments really based on?)

(Nor are all the (content) words of a language ranged along a strict hierarchy of abstractness. Indeed, our overall goal is to determine the actual graphic structure of dictionary definition space, whatever it turns out to be, and to see whether some of its properties are reflected also in the mental lexicon, i.e., not only our mental vocabulary, but how word meanings are represented in our brains.)

— PT: “You suggest a variety of factors, including concreteness, imageability, and age of acquisition. You are now fitting a multilinear combination of these factors to the output of your algorithm. Of course, if you have enough factors, you can usually fit a multilinear model to your data. But this fitting is not the same as making a prediction and then seeing whether an experiment confirms the prediction.“

I am not at all confident that the grounding kernel, extracted by our algorithm, was bound to be positively correlated, pairwise, with age of acquisition, concreteness, imagability and frequency, but we predicted it would be. We did not predict the change in sign of the correlation in the multiple regression, but it seems an interesting, interpretable and promising result, worthy of further analysis.

— PT: “I am willing to make a testable prediction: If my claims (1) and (2) are true, then you should be able to modify your algorithm so that the kernel words are indeed more concrete. You just need to ‘turn around your operation’.“

I am not quite sure what you mean by “turn around your operation,” but we would be more than happy to test your prediction, once we understand it. Currently, the “operation” is just to systematically set aside words that can be reached (via definition) from other words, iteratively narrowing the other words to the grounding kernel that can only be reached from itself. This operation moves steadily inward. I am not sure what moving steadily outward would amount to: Would it be setting aside words that cannot be reached via definition? Would that not amount to a more awkward way of generating the same partition (grounding kernel vs. rest of dictionary)?

Please do correct me if I have misunderstood.

Leave a Reply