3.3 Lemma
What are all the grammatical forms of be, cut, tree, nice, beautiful?
- be, am, are, is, were, was, been, ’s, ’m, ’re, ?being
- cut, cuts, (cut, cut), ?cutting
- tree, trees, tree’s, trees’
- nice, nicer, nicest
- beautiful
A lemma is all the inflectional forms of a word. This includes forms with grammatical affixes (tree, trees) and suppletive forms (go, went). What is not included is derivational suffixes like the adjectival -ly. Of course, this requires a clear definition of inflection and derivation. Some researchers might argue that the participial -ing is derivational rather than inflectional. There is also the issue of whether the past participle of some verbs like cut is to be seen as separate “form” or not.
When it comes to the technical side of research, you have to be aware of the decisions taken when lemmatizing corpus data as to what counts and what doesn’t. A lemma in a corpus is not necessarily equal to a lexeme as a linguistic concept.
3.3.1 Distribution
Information about the frequency of a word or its forms can already be very informative. We can extend this idea easily and look at the larger units a word or lemma occurs in. Frequency information about the distribution is much more complex, but is based on the same underlying concepts and measured with the same tools.
As we will see in the up-coming reading Justeson & Katz (1991), the distribution of adjective pairs plays a crucial role in the formation of antonym pairs. There, the deciding factor is whether they occur together in the same context — different form same context. We could flip this around and look at words with same form that occur in wildly different contexts. A special case of this is homonymy.
How can we find out if something is a homonym if we do not know the meaning or want to keep intuition out of the picture?
Animal or sport utensil?
- Maybe I’m a fruitarian bat
- … with a straighter bat than some of the Englishmen
- The unfortunate starved bat was then returned
- And not simply a bat, but an autographed bat
(examples from The British National Corpus 2007)
> [pos = "AJ.*"] []? [hw = "bat"] BNC
In this example, the grammatical structure is similar. We find attributive adjectives preceding bat which is typical for nouns. However, the meaning of the adjectives provides enough context to disambiguate the two uses of bat. The lexicon is structured by both grammar and meaning. If you expanded this to more co-occurrence patterns, e.g. with verbs or even different text types, two clearly distinct patterns emerge. The animal bat eats, which is similar to other animals, whereas the utensil bat strikes like other club-like devices. A Giraffe rarely strikes and a tennis racket doesn’t eat. They each are parts of distinct lexical fields. Distribution plays a defining role in the structure of those fields, therefore, our lexicon.
3.3.2 Association
A key component of human memory is association. The lexicon is organized in associative networks. What we perceive together frequently, we associate as belonging together. This is also referred to as spatial or temporal contiguity.
- law and …?
- order
- good or …?
- bad, evil
- the number of the …?
- ??beast
- spoils of …?
- ??war
The first word that comes to mind when you read the first two fragments is most likely law and order, and good or bad. For the other two examples, there is expected to be more variation. A metal fan might readily come up with beast, since the song of the same name is part of their cultural experience, and therefore, very frequent for them. spoils of war might not be a phrase that everyone is familiar with at all. spoils as a word is very rare; yet there is a strong association with the phrase. If it is encountered, it occurs together with war more often than not.