10 Productivity

When it comes to derivational affixes, there are 3 concepts that become very important when you work with data.

  1. Analyzability
  2. Semantic transparency
  3. Productivity

The first aspect, is related to the question of whether a word is complex or not. There are monomorphemic (simple) and polymorphemic (complex) words based on whether you can break them up into component parts. This is straightforward in most cases:

  1. {Analyz}-{abil}-{ity}
  2. {Productiv}-{ity}

These two words are clearly composed of smaller meaningful units. Note that there is a lot of spelling variation at morpheme boundaries and there is also allomporphy. Don’t confuse the two. productive is missing its final 〈e〉 when it occurs with a suffix, but the phonological form changes. This is a purely orthographic convention. The change in stress from productive to productivity, however, does cause a change in the phonetic form: /pɹədˈʌktɪv/ → /pɹədəktˈɪv-/. Likewise, the change of the suffix {able} from /-ɪbɫ/ in analyzable to /-ɪˈbɪl-/ can be considered allomorphy.

Another dimension of complexity comes into play when encounter words in your data for which it isn’t even clear whether they are complex or not. Some might seem complex, but require meta-linguistic knowledge, i.e. explicit, learned knowledge about the language. Common cases are loanwords like: poltergeist, schizophrenia, statistics, To a German speaker poltergeist clearly seems polymorphemic because it is a compound in German. This knowledge, however, needs to be learned explicitly by a native speaker of English, because neither component is used on its own in English. schizophrenia is a similar case. You can analyze the component parts academically by looking up the etymological origin. Within the English language without explicit knowledge of Latin and Greek. statistics is trickier. There might be a degree of analyzability here because the suffix -ics is common enough in English, especially for academic disciplines. It is still different from regular derived words in that the root alone doesn’t really carry a conventional meaning, at least one conventional in English. The root in statistics can be considered semantically opaque, which is the opposite of semantically transparent.

Now let’s consider the following three examples:

  1. ceiling
  2. building (N)
  3. interesting

All three words have a very recognizable English -ing suffix. In that sense, they are clearly analyzable as polymorphemic. That means in a native speaker of English the different components might trigger multiple associations. They are still problematic examples if you are interested in derivations with -ing. ceiling has a similar problem to statistics because the root is not really transparent in meaning. /siːl/ does exist in a handful of other English words, such as conceal. So you could make the argument that it is slightly more transparent than statistics. This illustrates that all concepts discussed here are matters of degree. building and interesting are both more transparent in meaning and analyzable. You might still want to treat them separately if you look into -ing as derivational suffix because they are strongly lexicalized. That means they are unlikely to be perceived compositionally, i.e. we have them stored as a whole in our lexicon and there is little to no analogical derivation happening.

Finally, morphemes vary in terms of and how widely they are spread across the lexicon and how commonly they are used to derive new words. A highly productive derivational affix is used with a large variety of roots. This is true of the regular inflectional affixes in English. -s is used as plural suffix on most nouns and it is also the most likely candidate for a neologism. An irregular plural like -en in oxen is more like a living fossil. It is limited to certain words and never used with new vocabulary.

In derivation, productivity is not as clear cut. Let’s consider the adjectival suffix -ish as in the examples below, taken from the Corpus of Contemporary American English (COCA) with the query [word = ".+-ish"].

  1. I’m still feeling flu-ish.
  2. … for a nation that was a loyal-ish member of NATO.
  3. That was selfish rather than other-ish.

In modern spoken language these types of derivations feel quite common. They may not be considered standard or formal language, which is why you will often find them hivenated. Still, intuitively, -ish should be considered productive.

Let’s try to catch more -ish adjectives with a more general query. It is a good idea to exclude capital letters with [^A-Z] since a homonym of -ish frequently occurs with nationalities/language names: [word = "[^A-Z].+ish" & class = "ADJ"]. Let’s also look at some other suffixes used to derive adjectives and compare some numbers from the BNC with some minimal cleanup:

tokens suffix cqp query (BNC)
8196 -ish [word = "[^A-Z].+ish" & class = "ADJ"]
168294 -ous [word = ".{2,}ous" & class = "ADJ"]
1025618 -al [word = ".{2,}al" & class = "ADJ" & word != "real"%c]
barplot(c(8196, 168294, 1025618),
        ylab = "token frequency",
        names.arg = c("-ish", "-ous", "-al"))

On a first glance, -ish is much less frequent in the BNC. This could have something to do with the composition of the corpus (try to restrict to match.text_mode = “spoken”. Is there a difference?). Normalizing relative to sample size does not help in this case because our sample is the same across. We could also look for disproportional amounts of false positives due to our queries, but maybe this is not going to be necessary. Frequency alone is not a sufficient measure for productivity. We need to check how much of the vocabulary is covered as well. One measure that can be used is the type-token ratio. How many occurrences are there relative to how many different words they occur across? We can get the type frequency by creating a frequency list with count and counting how many lines it has. A quick way to do this is to use an external command line tool called wc with the -l option for line count: count by word %c > "| wc -l". Note: the following graphs were made with the statistical software called R. I provided the code just in case anyone is interested.

types tokens suffix
557 8196 -ish
1410 168294 -ous
6315 1025618 -al
barplot(c(557, 1410, 6315),
        ylab = "type frequency",
        names.arg = c("-ish", "-ous", "-al"))

The discrepancy in type counts is already much lower and when we take the ratio, we can see that all of a sudden -ish is far ahead. It might not be as frequent as the other two suffixes, but it is also relatively more evenly spread over different roots. You could also imagine the other suffixes being bound to fewer roots that are very high in frequency individually.

barplot(c(557, 1410, 6315) / c(8196, 168294, 1025618),
        ylab = "type-token ratio",
        names.arg = c("-ish", "-ous", "-al"))

An even better measure is the hapax-token ratio. Instead of measuring the spread over different roots, we can measure the amounts of roots that only occur once. This may serve as an indicator for the formation of neologism. To get hapax frequency, we can add the following to our queries: [... & f(word) = 1]. We can observe the following result:

hapaxes: types: tokens: suffix
271 557 8196 -ish
584 168294 -ous
3438 1025618 -al
barplot(c(271, 584, 3438) / c(8196, 168294, 1025618),
        ylab = "hapax-token ratio",
        names.arg = c("-ish", "-ous", "-al"))

Even though -ish is comparatively infrequent, it occurs with a lot of types and seems to be used to form neologisms more often than the other two. This might indicate that it is more productive, and also that it is a more recent phenomenon because none of the uses have had the chance yet to become frequent individually. Another observation we can make is that while -al’s type-token ratio is higher than that of -ous, the hapax-token ratios are almost even. The data suggests that their productivity is similar, but that -al is more wide-spread in the lexicon, which makes intuitive sense.

10.1 Tiwilbemba

Today I am sharing with you my biggest regret looking back on uni days, thus, my biggest tiwilbemba: Not learning LaTeX/Markdown early enough.

Many of you are no fans of sitting in front of the computer all day. If you are a student, you will use a significant amount of time writing essays, term papers, and theses. The biggest time sinks with these are formatting, tables of contents, bibliographies, lists of abbreviations, etc. What if I told you that you don’t have to spend any time with this? If you know just enough LaTeX/Markdown, you can skip over all these steps, which equals less time tinkering at the computer. If you watched my first term paper stream, you literally saw me set up a document from scratch in under 5 minutes, including cover sheet, table of contents and bibliography, everything formatted perfectly and updated dynamically as I fill it.

It might feel counter-intuitive to spend even more time learning an entirely new computer skill. But bear with me. The time you spend on learning how to write documents in LaTeX or Markdown is ridiculously small compared to the days if not weeks of formatting frustration you can save yourself. I have always been rather tech savvy, and I know Microsoft Word much better than, I guess, the average user. Still, in hindsight, I feel like I was wasting my time. I wrote all my seminar papers, essays and theses in Microsoft Word. And I regret it.

This section is not a tutorial, rather an encouragement for you to expand your horizon (even though I will upload a simple set up for a term paper in the appendix soon). First, a profile of people that should, in my opinion, learn writing in plain text (LaTeX or Markdown).

Group 1: You have to write…

  1. Academic papers
  2. Reports
  3. Articles
  4. Books

Anything that requires a simple style and that doesn’t require a crazy amount of design greatly profits from LaTeX/markdown. Any repetitive work that requires consistent formating, too. If you write larger works like books, you’d be crazy not to use LaTeX. Students definitely belong in this group. I’d say, if you force yourself to learn it now, by the time you write your bachelor thesis, it will have been worth it already.

Of course, there are people who might be happy with graphical programs. To be fair, let’s profile these people, too.

Group 2: You have to write

  1. not much at all, only the occasional document
  2. Documents with constantly changing formatting
  3. Design-heavy documents (e.g. Ad material)

If you belong to this group, you might not profit from learning LaTeX too much, and you probably don’t care for Markdown either. Creative design is difficult, unless you are very experienced already.

Here are some reasons people have against learning LaTeX that are not valid in my opinion.

  1. “It’s difficult.”
    As soon as you’ve set it up and learned the basics it is actually sooo much easier. There are also platforms with great communities like Stackoverflow, where almost any problem you encounter has been solved by users with full examples. You just have to search for it.
  2. “I am not a programmer”
    Neither am I. Don’t let the syntax scare you.
  3. “I’ll learn it eventually, but for now I have to get this paper done quickly.”
    Nope… That’s what I told myself up until a year or so ago.
  4. “I’ll need to work with people that use .docx.”
    A good .tex file can be easily transformed into .docx or .odt thanks to tools like Pandoc.

Finally, some reasons people might not consider normally.

  • Professionals love it: If you were to write a program, you’d ask a programmer how to do it best. If you were to build a door, you’d ask a carpenter. For some reason, if people do typesetting, they do not use the tools of professionals. Most publishers use LaTeX, and also accept Latex files. It’s definitely not a bad thing to put on your résumé either.
  • Focus: not seeing the output immediately is actually a great thing. You might have just hopped onto the train of thought and the words just spill onto the screen when,… Hark! The table you placed so carefully a moment ago moved unexpectedly to the wrong page… Moment over, distraction has won. This is not gonna happen with LaTeX/markdown. I personally find myself micromanaging all the time in word.
  • Light weight: if you have an old computer or laptop that is old or cheap (or pretty, expensive but still weak,…you know) Windows and Microsoft Word/MacOS and pages might actually run rather slowly. If you think they are fast, you haven’t experienced the alternative. Especially large documents might take some time to load. If you have everything in plain text files, you’re document loads in a split second. That might take away some subconscious blocks that prevent you even from even opening your project. Just pop it open and quickly add a thought to your paper. Sooo comfy. :) As a matter of fact, I’m currently writing this very article from my phone using an editor called Markor with my source file synced in my cloud.
  • Gateway drug: Writing your term paper in plain text might just be the beginning. If you understand LaTeX, you basically get html for free. The principle is the same, just with slightly different syntax. If you use something like Rmarkdown, you can essentially export your project seamlessly into any format with little adjustment needed. You might be tempted to write your own website. Maybe you get into extensible text editors, terminals, scripting, maybe even Linux, maybe even… Vim? The rabbit hole goes deep. ;)