Standing in front of us is a dataset of thousands of beer reviews taken from
website. Each reviewer has given a 1-5 rating for each
of the four beer aspects: Aroma, Palate, Appearance and Taste, along with an Overall
grade. In addition, they also provided a textual review further explaining their
opinion on a specific beer. In a nutshell, our aim is to explore what is it we write
in these reviews that influences the ratings significantly. These conclusions can
help breweries adjust their production according to people’s preferences and improve
targeting: Do people often say that a beer is too sweet? It is an indicator to reduce
sugar during the brewing process. On the other hand, are taste and appearance rated well?
The breweries can then focus their commercials on slow-motion close-ups of a person
drinking from a transparent glass, with their mouth in the forefront.
So, we will focus on answering two main questions:
~ Which beer aspects are the most important
for users? Namely, what are the aspects
that correlate positively with their overall rating?
~ What keywords do reviewers use
about these aspects that are decisive
factors when they give good grades?
One might ask, why do we even introduce textual analysis in our work?
Okay, for our second question, the answer is quite straightforward – we need
it to actually
extract important words. But what about the first one?
Aren't the grades users give enough to infer if an aspect stands out as
influential? Well, there are two problems that arise.
First, people mostly give good grades. As can be seen from Figure 1
a vast majority of reviews (>90%) have numerical ratings all higher than 3.
Therefore, if a certain aspect indeed stands out, its difference in rating compared
to the other ones will not be so easy to notice.
Figure 1 : Users mostly give high ratings