3.4 Frequency and memory

3.4.1 Common and uncommon vowels

In order to illustrate some basic frequency effects (as in count not pitch), we had a little experiment in class today with German vowel sounds.

Let’s consider a subset of the German monophthongs with relatively consistent phonetic spellings. We’re taking orthography as an approximation for pronunciation here.

Experimental task:
Find as many words as you can that begin with either one of the vowels of a given pair.
Vowel Token count (LCC7) Type count “Experimental” counts on average
/aː a/ 376,588 22915 20.6
/iː ɪ/ 268,792 6614 12.0
/uː ʊ/ 191,164 8122 12.0
/oː ɔ/ 47,160 5062 11.8
/yː ʏ/ (ü) 22,095 872 11.2
/øː œ/ (ö) 2,209 100 6.0
# CQP query example, words beginning with "a", and ignore case:
a="a.+" %c

# for token counts
size a

# for (case-insensitive) type counts, using external program "wc" to count lines
count a by word %c > "| wc -l"

The expected outcome: People find most words with i, then a and u, and much less with ü and ö. The groups brainstorming for more common vowels had many more distinct words to choose from, which they are also more likely to have encountered more often. If you were to ask how difficult learners of German perceive the pronunciation of each of these vowels, you would see a correlation with the frequency with which those vowels appear in corpus data.

Furthermore, we can observe that front rounded vowels are rare across language (Maddieson 2013). But why are those vowels so much rarer in the first place?

There are three possibilities:

  • There is a mistake in the approach to counting.
  • It is coincidence
  • There is something categorically different about ü and ö

Let’s assume the latter is the case. What ü and ö have in common is that they are front rounded vowels. In fact, we have a pretty good idea about why they are special. In a nutshell: the frequency make-up (in the sense of pitch) of front rounded vowels is not as distinctive as that of other vowels. [a, i, u] are maximally distinct from each other so (almost) all language make a distinction between them. [i] and [e] are more similar in sound yet still much more distinct than [i] and [y]. It is much more common to see a language make a distinction between the former than the latter. The exact cross-linguistic patterns and the interesting bio-phyisical reasons are far outside the scope of this course, unfortunately. The important conclusion is that we found an interesting correlation with the help of corpus data that we could corroborate with other pieces of data, and that ultimately leads us to a fundamental property of language.

3.4.2 Confounding variables

We measured vowel counts with orthographic characters.
What could skew our data systematically?

  • there are common prefixes (an, un, über, ), that cause many different types
  • i, a, and u occur in diphthongs
  • i, a, and u might represent different monophthongs (especially in loan words)
  • ö and ü are sometimes transliterated with oe and ue

There are always many factors that could skew your data in one direction or another. In this case, the observed pattern is probably amplified by the variables above. Ideally, you would control for those confounding variables, and if you can’t, judge the potential implications. One of the mottos in science is: always try to prove yourself wrong.

References

Maddieson, Ian. 2013. Front rounded vowels. In Matthew S. Dryer & Martin Haspelmath (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. https://wals.info/chapter/11.
Quasthoff, Uwe, Matthias Richter & Christian Biemann. 2006. Corpus portal for search in monolingual corpora. LREC, 1799–1802.

  1. Leipzig Corpora Collection, German news 2010 (Quasthoff, Richter & Biemann 2006);↩︎