3.4 Frequency and memory

3.4.1 Common and uncommon vowels

In order to illustrate some basic frequency effects (as in count not pitch), we had a little experiment in class today with German vowel sounds.

Let’s consider a subset of the German monophthongs with relatively consistent phonetic spellings. We’re taking orthography as an approximation for pronunciation here.

Experimental task:: Find as many words as you can that begin with either one of the vowels of a given pair.

Vowel	Token count (LCC ⁷)	Type count	“Experimental” counts on average
/aː a/	376,588	22915	20.6
/iː ɪ/	268,792	6614	12.0
/uː ʊ/	191,164	8122	12.0
/oː ɔ/	47,160	5062	11.8
/yː ʏ/ (ü)	22,095	872	11.2
/øː œ/ (ö)	2,209	100	6.0

# CQP query example, words beginning with "a", and ignore case:
a="a.+" %c

# for token counts
size a

# for (case-insensitive) type counts, using external program "wc" to count lines
count a by word %c > "| wc -l"

The expected outcome: People find most words with i, then a and u, and much less with ü and ö. The groups brainstorming for more common vowels had many more distinct words to choose from, which they are also more likely to have encountered more often. If you were to ask how difficult learners of German perceive the pronunciation of each of these vowels, you would see a correlation with the frequency with which those vowels appear in corpus data.

Furthermore, we can observe that front rounded vowels are rare across language (Maddieson 2013). But why are those vowels so much rarer in the first place?

There are three possibilities:

There is a mistake in the approach to counting.
It is coincidence
There is something categorically different about ü and ö

Let’s assume the latter is the case. What ü and ö have in common is that they are front rounded vowels. In fact, we have a pretty good idea about why they are special. In a nutshell: the frequency make-up (in the sense of pitch) of front rounded vowels is not as distinctive as that of other vowels. [a, i, u] are maximally distinct from each other so (almost) all language make a distinction between them. [i] and [e] are more similar in sound yet still much more distinct than [i] and [y]. It is much more common to see a language make a distinction between the former than the latter. The exact cross-linguistic patterns and the interesting bio-phyisical reasons are far outside the scope of this course, unfortunately. The important conclusion is that we found an interesting correlation with the help of corpus data that we could corroborate with other pieces of data, and that ultimately leads us to a fundamental property of language.

3.4.2 Confounding variables

We measured vowel counts with orthographic characters.
What could skew our data systematically?

there are common prefixes (an, un, über, ), that cause many different types
i, a, and u occur in diphthongs
i, a, and u might represent different monophthongs (especially in loan words)
ö and ü are sometimes transliterated with oe and ue
…

There are always many factors that could skew your data in one direction or another. In this case, the observed pattern is probably amplified by the variables above. Ideally, you would control for those confounding variables, and if you can’t, judge the potential implications. One of the mottos in science is: always try to prove yourself wrong.

References

Maddieson, Ian. 2013. Front rounded vowels. In Matthew S. Dryer & Martin Haspelmath (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. https://wals.info/chapter/11.

Quasthoff, Uwe, Matthias Richter & Christian Biemann. 2006. Corpus portal for search in monolingual corpora. LREC, 1799–1802.

Leipzig Corpora Collection, German news 2010 (Quasthoff, Richter & Biemann 2006);↩︎