9.1 Comparing Apples to Mangos
As another example, let’s apply the same logic as above but now for a different company name: Apple.
One aspect worth mentioning about querying in CQP is that everything is interpreted literally. That means that it makes a difference whether we search for "apple"
or "Apple"
. When it comes to proper names this can be helpful and get rid of many false positives referring to fruit. If you don’t want this behaviour and want to include all permutations of capitalizations, you can append %c
to the end of every token or after word when you count
. This literally means “ignore case.”
In a nutshell, to improve our results, we excluded Big as in Big Apple and we decided to exclude any attributive use. To achieve that, we excluded any pos tag that starts with N, using a regular expression: [word != "Big"] [word = "Apple"] [pos != "N.*"]
Among the results, we spotted some metonymies, most of which included personification. In order to explore those personification contexts more, we restricted to any occurrence directly followed by a verb. This gives us mostly subject uses of Apple.
[word != "Big"] [word = "Apple"] [class = "VERB"]
Bare in mind that those queries are not exact and only for exploration. NOUN + VERB
is not a sufficient query to find subjects reliably, much less personifications. But these some steps you can take to begin to filter your results.
As a next step, we widened our scope and included more and more related brands into our query by stringing them together with the logical or |
. We were trying to define a list of of Social media brands.
[word = "Apple|Microsoft|Google|Facebook|Twitter|Instagram|ICQ|Windows"]
At this point, it might be worth looking into a CQP feature called wordlists: See the official tutorial for examples: Click
In an actual study, this list should not be arbitrary and best be exhaustive, meaning you should include all brands that match certain criteria you define first. A convenient and often used way to define a lower boundary to make exhaustive categories possible in the first place is defining a minimum frequency.
As a final example, let’s take the above results and make a comparison with fashion brands. While tech and social media brands seem to frequently occur in personification contexts, we could not find the same behaviour in fashion. Rather, we found that the metonymies seem to be mostly in combination with local prepositions. In order to have a first test on this impression, we can use the count
command.
= "Adidas|Place|Levi|Nike|Primark|H&M|Lacoste"]
[word
`count by pos on match[-1]`
`count by class on match[1]`
32 out of 106 matches are preceded by prepositions while only 14 are followed by a verb. As a next step we would have to make our lists of brands more exhaustive, our queries for both categories more robust, and compare the co-occurrence frequencies properly.
The tentative hypothesis we drew from this short exploration was that Social Media brands are conceptualized as humans/actors while fashion brands are conceptualized as places, which is quite exciting for 90 minutes of playing with data.