4.2 Frequency and memory
4.2.1 Common and uncommon vowels
In order to illustrate some basic frequency effects (as in count not pitch), we had a little experiment in class today with German vowel sounds.
Let’s take a subset of the German monophthongs with relatively consistent phonetic spellings. We’re taking orthography as an approximation for pronunciation here.
Vowel | Frequency counts (DWDS7) | Perceived difficulty | Experimental counts |
---|---|---|---|
/iː ɪ/ | 267,353 | easy | 56 |
/aː a/ | 162,873 | easy | 37 |
/uː ʊ/ | 113,065 | easy | 53 |
(ü) /yː ʏ/ | 30,568 | difficult | 22 |
(ö) /øː œ/ | 24,562 | difficult | 30 |
- Experimental task:
- Find as many adjectives as you can that contain the given vowel (long or short) within 3 minutes.
The expected outcome: People find most words with i, then a and u, and much less with ü and ö. We can see a correlation with the frequency with which those vowels appear in corpus data and how difficult learners find their pronunciation. The extra-ordinary performance of our u-group can partly be explained by the participants discovering adjectives with the very productive prefix un-. This in it self is an interesting association pattern.
It makes sense to hypothesize that it is easier to come up with examples if there is more to choose from. Furthermore, we can observe that front rounded vowels are rare across language (Maddieson 2013). But why are those vowels so much rarer in the first place?
Their are three obvious possibilities:
- We made a mistake
- It is coincidence
- There is something categorically different about ü and ö
Let’s assume the latter is the case. What ü and ö have in common is that they are front rounded vowels. In fact, we have a pretty good idea about why they are special. In a nutshell: the frequency make-up of front rounded vowels is not as distinctive as the one’s of other vowels. [a, i, u] are extremely distinct from each other so (almost) all language make a distinction between them. [i] and [e] are more similar in sound yet still much more distinct than [i] and [y]. It is much more common to see a language make a distinction between the former than the latter. The exact cross-linguistic patterns and the interesting bio-phyisical reasons are far outside the scope of this course. The important conclusion is that we found an interesting correlation with the help of corpus data that we could corroborate with other pieces of data, and that ultimately leads us to a fundamental property of language.
4.2.2 Confounding variables
We measured vowel counts with orthographic characters.
What could skew our data systematically?
- i, a, and u occur in diphthongs
- i, a, and u might represent different monophthongs (especially in loan words)
- ö and ü are sometimes transliterated with oe and ue
- …
There are always many factors that could skew your data in one direction or another. In this case, the observed pattern is probably amplified by the variables above. Ideally, you would control for those confounding variables, and if you can’t, judge the potential implications.
DWDS Kernkorpus 21 (2000–2010); example query:
*ö* WITH $p=ADJ*
↩︎