3.1 Parts of speech

3.1.1 Recap: Open and Closed Word Classes

The idea of open and closed word classes is the first we can quantify very easily with the help of corpus data. As opposed to a closed word class, an open word class should have a lot more members. Let’s first recap what types of word classes we know.

Open word classes

Nouns: time, book, love, kind

Verbs: find, try, look, consider

Adjectives: green, high, nice, considerate

Adverbs: really, nicely, well

Closed word classes

Pronouns: I, you, she, they, mine, …

Determiners: the, a(n), this, that, some, any, no, …

Prepositions: to, in, at, behind, after, …

Conjunctions: and, or, so, that, because, …

Closed word classes rarely accept new members. One rather recent addition to the class of pronouns might be considered singular they. Closed word classes are also mostly invariant in that they do not take inflection. Neither of these properties are logically necessary. You could imagine more pronouns. Some languages have a dual in addition to singular and plural (e.g. Classical Arabic), or a distinction between inclusive and exclusive we (several Polynesian languages). Yet the class of pronouns is rather fixed.

Lexical vs. Function word

Auxiliary verbs: be, have, (get, keep)

Lexical verbs: eat, sleep, repeat, …

These first observations about word classes lead us to our core hypothesis for this week. Closed word classes have fewer members than open word classes.

3.1.2 PoS-Tags

Figuring out the word class of each word is done with Part-of-speech taggers. Tools like the Tree Tagger (Schmid 2013) can determine word classes with an accuracy of around 95% (Horsmann, Erbs & Zesch 2015). Even though this is good enough for most purposes, you have to bear in mind that automatic annotation is error prone and can cause some spurious patterns that have to be accounted for. We will encounter such cases in future sections.

PoS-tags

annotation for word class available in most corpora

automatized

around 95% accuracy (Horsmann, Erbs & Zesch 2015)

e.g. Tree Tagger (Schmid 2013)

References

Horsmann, Tobias, Nicolai Erbs & Torsten Zesch. 2015. Fast or accurate? – a comparative evaluation of PoS tagging models. Proceeding of the second italian conference on computational linguistics, 166–17. Trento, Italy: Accademia University Press.

Schmid, Helmut. 2013. Probabilistic part-of-speech tagging using decision trees. New methods in language processing, 154.