3.1 Parts of speech

3.1.1 Recap: Open and Closed Word Classes

The idea of open and closed word classes is the first we can quantify very easily with the help of corpus data. As opposed to a closed word class, an open word class should have a lot more members. Let’s first recap what types of word classes we know.

Open word classes
  • Nouns: time, book, love, kind
  • Verbs: find, try, look, consider
  • Adjectives: green, high, nice, considerate
  • Adverbs: really, nicely, well
Closed word classes
  • Pronouns: I, you, she, they, mine, …
  • Determiners: the, a(n), this, that, some, any, no, …
  • Prepositions: to, in, at, behind, after, …
  • Conjunctions: and, or, so, that, because, …

Closed word classes rarely accept new members. One rather recent addition to the class of pronouns might be considered singular they. Closed word classes are also mostly invariant in that they do not take inflection. Neither of these properties are logically necessary. You could imagine more pronouns. Some languages have a dual in addition to singular and plural (e.g. Classical Arabic), or a distinction between inclusive and exclusive we (several Polynesian languages). Yet the class of pronouns is rather fixed.

Lexical vs. Function word
  • Auxiliary verbs: be, have, (get, keep)
  • Lexical verbs: eat, sleep, repeat, …

These first observations about word classes lead us to our core hypothesis for this week. Closed word classes have fewer members than open word classes.

3.1.2 PoS-Tags

Figuring out the word class of each word is done with Part-of-speech taggers. Tools like the Tree Tagger (Schmid 2013) can determine word classes with an accuracy of around 95% (Horsmann, Erbs & Zesch 2015). Even though this is good enough for most purposes, you have to bear in mind that automatic annotation is error prone and can cause some spurious patterns that have to be accounted for. We will encounter such cases in future sections.

PoS-tags
  • annotation for word class available in most corpora
  • automatized

References

Horsmann, Tobias, Nicolai Erbs & Torsten Zesch. 2015. Fast or accurate? – a comparative evaluation of PoS tagging models. Proceeding of the second italian conference on computational linguistics, 166–17. Trento, Italy: Accademia University Press.
Schmid, Helmut. 2013. Probabilistic part-of-speech tagging using decision trees. New methods in language processing, 154.