2.2 Counting words

2.2.1 Recap: Open and Closed Word Classes

The idea of open and closed word classes is the first we can quantify very easily with the help of corpus data. As opposed to a closed word class, an open word class should have a lot more members. Let’s first recap what types of word classes we know.

Open word classes
  • Nouns: time, book, love, kind
  • Verbs: find, try, look, consider
  • Adjectives: green, high, nice, considerate
  • Adverbs: really, nicely, well
Closed word classes
  • Pronouns: I, you, she, they, mine, …
  • Determiners: the, a(n), this, that, some, any, no, …
  • Prepositions: to, in, at, behind, after, …
  • Conjunctions: and, or, so, that, because, …
Lexical vs. Function word
  • Auxiliary verbs: be, have, (get, keep)
  • Lexical verbs: eat, sleep, repeat, …

2.2.2 PoS-Tags

PoS-tags
  • annotation for word class available in most corpora
  • automatized
  • around 95% accuracy (Horsmann, Erbs & Zesch 2015)
  • e.g. Tree Tagger (Schmid 2013)

Figuring out the word class of each word is done with div (Part-of-speech taggers). Tools like the Tree Tagger (Schmid 2013) can determine word classes with an accuracy of around 95% (Horsmann, Erbs & Zesch 2015)

References

Horsmann, Tobias, Nicolai Erbs & Torsten Zesch. 2015. Fast or accurate? – a comparative evaluation of pos tagging models. In, Proceeding of the second italian conference on computational linguistics, 166–17. Trento, Italy: Accademia University Press.

Schmid, Helmut. 2013. Probabilistic part-of-speech tagging using decision trees. In, New methods in language processing, 154.