3 Form and meaning
3.1 Lemma
This week we were working towards enriching our text data with information. We might be more interested in lexical units rather than word forms. The concept of lemma is commonly used to approach this. Remember, lemma usually has a technical definition that is used to gain information about lexemes the same way as we look at tokens to gain information about words. But they are not the same.
A lemma, similar to a lexeme, groups certain forms of a word. What are all the grammatical forms of be, cut, tree, nice, beautiful?
- be, am, are, is, were, was, been, ’s, ’m, ’re, ?being
- cut, cuts, (cut, cut), ?cutting
- tree, trees, tree’s, trees’
- nice, nicer, nicest
- beautiful
A lemma comprises all the inflectional forms of a word. This includes forms with grammatical affixes (tree, trees) and suppletive forms (go, went). What is not included is derivational suffixes like the adjectival -ly. Of course, this requires a clear definition of inflection and derivation. For example, there might be disagreement among researchers about whether participial -ing is derivational or inflectional. There is also the issue of whether the past participle of some verbs like cut is to be seen as separate “form” or not.
When it comes to the technical side of research, you have to be aware of the decisions taken during lemmatization of corpus data as to what counts and what doesn’t. The operational definition of lemma might not match the relevant definition of lexeme as a linguistic concept.
3.1.1 Distribution
Information about the frequency of a word or its forms can already be very informative. We can extend this idea easily and look at the larger units a word or lemma occurs in. Frequency information about the distribution is much more complex, but is based on the same underlying concepts and measured with the same tools.
As we will see in the up-coming reading Justeson & Katz (1991), the distribution of adjective pairs plays a crucial role in the formation of antonym pairs. There, the deciding factor is whether they occur together in the same context — different form same context. We could flip this around and look at words with same form that occur in wildly different contexts. A special case of this is homonymy.
How can we find out if something is a homonym if we do not know the meaning or want to keep intuition out of the picture?
Animal or sport utensil?
- Maybe I’m a fruitarian bat
- … with a straighter bat than some of the Englishmen
- The unfortunate starved bat was then returned
- And not simply a bat, but an autographed bat
(examples from The British National Corpus 2007)
In this example, the grammatical structure is similar. We find attributive adjectives preceding bat which is typical for nouns. However, the meaning of the adjectives provides enough context to disambiguate the two uses of bat. The lexicon is structured by both grammar and meaning. If you expanded this to more co-occurrence patterns, e.g. with verbs or even different text types, two clearly distinct patterns emerge. The animal bat eats, which is similar to other animals, whereas the utensil bat strikes like other club-like devices. A Giraffe rarely strikes and a tennis racket doesn’t eat. They each are parts of distinct lexical fields. Distribution plays a defining role in the structure of those fields, therefore, our lexicon.
3.1.2 Association
A key component of human memory is association. The lexicon is organized in associative networks. What we perceive together frequently, we associate as belonging together. This is also referred to as spatial and temporal contiguity.
- law and …?
- order
- good or …?
- bad, evil
- the number of the …?
- ??beast
- spoils of …?
- ??war
The first word that comes to mind when you read the first two fragments is most likely law and order, and good or bad. For the other two examples, there is expected to be more variation. A metal fan might readily come up with beast, since the song of the same name is part of their cultural experience, and therefore, very frequent for them. spoils of war might not be a phrase that everyone is familiar with at all. spoils as a word is very rare; yet there is a strong association with the phrase. If it is encountered, it occurs together with war more often than not.
3.2 Homework
At the institute, we have a host of corpora that are readily available for our students. We interact with these data sets via a program called Corpus Work Bench (Evert & Hardie 2011). In order to interact with our Corpus Lab at the institute, you need to do some setup. The previous blog posts here are full of examples, and there’s also plenty of examples to be found in the tutorials in links section. Go through the steps below and the links on the Wiki and try to run a few queries.
- Step 1: Set up login
- Step 2: Set up CWB
- Step 3: Recap Corpus Structure
- Step 4: Download and make yourself familiar with the tutorials
- Step 5: Run your first query
There is also some older YouTube tutorials, that might be helpful. However, do not follow any setup tutorials on YouTube.
3.3 Tip of the Day
Today: Multitasking
Learning an academic discipline takes a lot of time and focus. However, some aspects are like learning a language or motor skills. It might sound weird, but knowledge, especially theoretical, is like a muscle you can train. So here is my suggestion for how to get better at Linguistics or Literary Studies or whatever subject you are interested in: Listen to lectures, talks, podcasts and other content in the background.
Great topics to passively consume are:
- Repeating or recapping theory, e.g. Cognitive Linguistics
- Philosophy of Science, highly interesting, vastly important, but oft neglected
- Sciences that are not your major
Here are some activities I frequently use to bombard myself with knowledge.
- weight or endurance training
- practicing an instrument (especially repetitive technical exercises)
- cooking
- cleaning, tidying, building Ikea tables ;)
Non of these activities require your full mental focus or have long pauses, so your thoughts are free to meander through the depths of science. Nowadays, a lot of talks or even full lectures can be found online, and with online teaching taking off, there will be ever more.
Linguistics
Luckily, we are not the only university to have taught linguistics online. Here are some nice channels to binge watch both actively and passively.
- Martin Hilpert: Has a variety of lectures and full courses on all things linguistics.
- The Virtual Linguistics Campus: Old but gold.
- People without YouTube channels, but who are great lecturers, Adele Goldberg, Joan Bybee, George Lakoff, Geoffry Pullum. I have found many of their lectures and interviews online on various channels and platforms.
- NativLang: Probably my favorite language channel. Animation videos on a variety of language related topics. Focus on Cross-Linguistics.
Other sciences
If you are a curious person, and if you appreciate the academic endeavor, chances are you are interested in other sciences, too. Knowing subjects outside the social sciences may help you in unexpected ways. Here are my go-to channels to listen to in the background.
- mailab: Focus on (bio-)chemistry, but mostly deals with current debates on the media. You can learn a lot about how news outlets interpret and sometimes misrepresent scientific studies.
- Closer to the Truth: Philosophy. Dealing with the big questions. How do we know facts? Why should we trust in Science? What are hypotheses and theories and why bother?
- Statquest: Pleasantly cringey statistics videos.
- zedstatistics: More in depth. (Less cringe. :( )
- PBS Space Time: Astrophysics. Popular science without the usual dumbing down. Great stuff to listen to even if you understand nothing. :D
- 3Blue1Brown: Mathematical concepts with animations instead of formulae. I was horrible at maths in school but I always had a sense that it is actually a very beautiful subject. Wish I had visualizations like these back then.
- Computerphile: Various computer science topics
I have not yet explored the world of audio books and audio podcasts, but I’m sure there is a lot of great stuff out there.
If you discover anything, let me know! :)