Reflections on Simple Sentiment Analysis within Corporations
Sentiment analysis is often used to mine for people’s
feelings or sentiments as they relate to particular topics, ideas, or concepts.
This information is typically gathered from their online posts and other
behavior, but can also be imputed from users’ “filter-bubble” – the articles
they are more predisposed to click on – since many people click on links that
support for confirm their own deeply held views or convictions (confirmation
bias). There are many sources from where such sentiment can be mined, including
the blogosphere, online news, email, instant messages, twitter, … the list goes
on. In this paper, we focus on sentiment analysis of people’s email, and the
potential applications this might have in improving business.
Techniques and
methodology – supervised learning
Supervised learning is a machine learning technique where a
training data set is provided which has values assigned to a set of independent
variables with the associated value of the dependent (to be forecasted) one to
be used for training a forecasting model. The model is then tuned with a
validation set and used to generate predictions (predictive analytics) for
“out-of-sample” data from the test set. A good model will produce accurate
forecasts or predictions for test set data – the better the predictive model,
the better the classification rate, and the smaller the number of false
positives and false negatives.
In sentiment analysis in its most basic form, supervised learning
can be used in one of two ways – a. to classify test-set documents (news
stories, reviews, email, etc.) as positive or negative, or b. to elicit
“feelings” along a graded scale that indicate how the user feels about the
topics in question (this is along the lines of the “Goldstein Scale for WEIS
Data”[G], but tailored to our domain of interest, since the Goldstein scale is
for a specific problem domain – news of conflict in international events).
These feelings or the sentiment aggregated across the user population can be
used to make decisions - buy or sell decisions for stocks, talent management
decisions at corporations, etc. given the particulars of the problem domain in
question.
Why do we need a
Supervised Training Set for this exercise?
A supervised training set indicates clearly what text is
associated with positive or negative sentiment. From data elements in the
supervised set, one can mine words or phrases that are associated with positive
or with negative sentiment as outlined below. Once mined, these phrases can
then be used to determine the sentiment of new text, posts or email as
appropriate, within the context of the model used to make predictions.
Obtaining the data
set
So, where can we get data from, and how might we use it? The
website Yelp is an online forum that permits users to review businesses they
interact with. Over time, they have collected a large data set of reviews. They
periodically host machine learning competitions [YC] to see what insights they
can glean from the review data they have available. In a Kaggle-like fashion,
these data-sets are provided free-of-cost to data science professionals, mostly
students, for analysis. Each review in the provided data set comes with a
date-stamp of when the review was written, the complete identity of the
business being reviewed, the text of the review itself, the particulars of the
reviewer, and a star rating for how much or whether the user posting the review
approved of the business. There may be other fields, we find these to be the
most relevant for our current purposes.
But this data is for
another domain… will it still be useful?
Recall we are simply looking for data (preferably supervised
data) that can somehow tie particular phrases to particular sentiments. If we
are able to extract statistically improbable phrases from each review, then tie these probabilistically with the
model we are building for email, instant messages or other such data extracted
from a corporate context, we will be able to draw interesting inferences in
corporates just as easily. We explore potential techniques for such machine
learning analysis in the sections that follow.
Essentially what we are doing here is extracting learning(s)
from one data set, then transferring it to another data set similar in type but
different in content (transfer learning across models and data-sets). [TL]
Work email may be more formal than a restaurant review,
while work IM may be less formal and even use short-forms and jargon people
outside the business context might not understand. The first problem may be
addressed by using a thesaurus to provide synonyms for words for the various
N-grams constructed for the training sample, so we end up with multiple new
N-grams with equivalent meanings derived from the ones mined from the original
Yelp data-set. The second might be addressed in one of two ways:
a.
gathering more jargon or short-cut rich texts and
manually assigning them a sentiment rating to seed the supervised learning
population, and
b.
building a dictionary from manual analysis of IM
short-cuts to programmatically “repair” the texts into a form suitable for
automated analysis.
Of course, the N-gram generation process only proceeds after
we remove commonly used words (“stop words”) from a simple statistical analysis
of every review in our data-set, and apply industry-standard methods such as
stemming and lemmatization [SL].
Method 1: N-gram
training, thesaurus synonyms in n-grams [NG]
Individual words, as well as pairs, triplets, and quads of
consecutive words (called N-grams where N denotes the number of words in each
“gram”) are gathered for all the review text we have, and then analyzed to
generate a probabilistic map between
their presence in a review and the associated review rating. This process
assumes that individual N-grams are independent of each other in their
contributions to review sentiment. Once this process is complete, we end up
with something similar to the Goldstein data referenced above as output from a
Naive Bayes classifier. The degree of nuance we see in the output ties to the
number of stars in each review, as opposed to a numeric score, though we can of
course consider the number of stars in a review to be a proxy for a numeric
score in each case.
Method 2:
TF-IDF-based clustering [TFI]
The frequency of individual N-grams in each review (called
term frequency or TF) and the relative inverse frequency of each N-gram in
comparison with all other N-grams across all the review text we have (the
totality of review text is called the corpus, and the relative inverse
frequency is called the inverse document frequency or IDF) can be used to
locate statistically improbable phrases (or SIPs) that can be used to identify
particular reviews. If reviews with given SIPs more closely correlate with
particular sentiment buckets, this can be used to aid the classification
process going forward.
Method 3: K-Means
Clustering [KMC]
Consider the set of N-grams in N-dimensional space. If
K-means clustering is performed on this data where K=number of “star” classes,
then the N-grams associated with each class are indicators of sentiment for
that number of buckets in the input data. This analysis can then be used to tie
the results to a sentiment scale. Synonyms can then be generated using a
thesaurus, and the knowledge exported to the analysis of work-email text.
Interesting… but is
this analysis complete?
If we were analyzing Yelp data for deriving insights within
the Yelp context, the above analysis is incomplete because we have so far
looked only at the reviews, not at the reviewers. Reviewer A may be a hard
person to please, with reviews always averaging 1-3 stars. Reviewer B may gush
about all businesses he visits, giving out a large number of 5 star ratings. To
perform a complete and credible analysis, we need to be able to normalize
reviews from different reviewers prior to performing classifications.
Secondly, people may appear as fake reviewers for only their
own, and their competitors’ businesses (obviously talking up their own business
while slamming the competition). Given enough reviewer data, we can determine
whether reviewers are credible from the set of all reviews they each have
posted, and use that as a basis for either including, or excluding, the reviews
they have posted to the Yelp website, as training, validation, and test sets
are built.
Please note however that here we are more interested not in
analyzing data within the Yelp context, but in transferring learning of
sentiment indicators to a different context (work email) for further analysis.
This the above two problems do not impact the quality of our model construction
for analyzing work email/IM text.
Applications in
People Analytics [PA]:
Sentiment spreads as a “ripple” in space-time - where the
“pond” is the corporate context. Email and IM text may be the medium through
which sentiment propagates, but the network of people (different from the
top-down enforced organization chart) forms the structure which “conducts”
sentiment.
Email might express frustration with processes in place,
people in management, work load, among other things. Similarly, some written
communication might express joy at being able to make a positive contribution
to a project, happiness with team leadership, positive feelings at being able
to learn and apply new skills, with internal mobility within the company etc.
Knowing these things can help the HR department of a company perform their
roles more effectively.
Does rampant absenteeism correlate with particular managers?
Are people quitting from particular teams more frequently? Are promotions so
few and far between and bonuses so low people are being forced to look
elsewhere for more viable careers? Is a department so overloaded that staffing
them up might reduce attrition despite the announced firm-wide hiring freeze?
Are particular employees (particularly in the financial services industry)
engaging in nefarious behavior either within the firm or across firms with
outside collaborators?
These and more questions can be answered from analyzing
email and related data such as Instant Messaging and Internet Chat (... and in
some cases, text renderings of phone conversations especially where regulations
require that these be recorded e.g. trader lines in financial services). We
explore a few scenarios of interest in a little more detail in what follows:
Sentiment-Announcement
correlation
Senior management makes an announcement reshuffling the
organization or otherwise changing the organizational design. People of course
talk about these things. Analyzing email sentiment on an ongoing basis will
give us a means to impute any changes in sentiment against announcements that
are made and impact on morale. Certain geographies may be more impacted by
lay-off announcements than others - this will show when we analyze sentiment by
geography. Some announcements might cause immediate negative sentiment that
dissipates quickly, while others might cause negativity that persists over
time. All this can be factored into not only what is announced, how it is
announced (e.g. by a CEO in a town-hall to appear more humane vs. via email),
and what words are used to convey the information.
Ongoing Employee
Engagement Sentiment
How do employees feel about the company? Today, most firms
have an annual survey where every employee fills out a questionnaire regarding
their work environment. However, perhaps a better measure of engagement can be
obtained through periodic email analysis - this will tell us what people are
thinking on an ongoing basis, not just once at the end of the year. Besides,
while people might think one way and say something else during a survey,
sentiment in ongoing communications like email is likely to be much more honest
and a more relevant indicator of how management is performing.
Sentiment for
Incentive Design
What makes each employee tick? Some like being praised for
their work. Others want to get paid more. Some others might want a better
title. Yet others might want more of a certain kind of work and less of
something else. Not all employees have engaged managers they can speak with
about these things. All of this can be mined from email and chat data, and HR
can use this to structure incentives to promote retention and reduce attrition
of strong performers.
Sentiment for
Right-Sizing Organizations
“Right-Sizing” is a convenient euphemism senior management
uses for laying people off. While these decisions are painful, the process of
deciding which people to let go is many a time made poorly. A recent article
(need reference) explored how sometimes people viewed by senior managers as
being in the bottom 10% of performers are actually those that are the glue that
holds a group together. They spend their time ensuring all their team-mates are
successful, so letting go those people actually hurts your firm’s performance
more than the savings in their salary helps the firm’s bottom-line. Mining
email and text within the context of the firm’s social network (informal
connections superimposed on the org-charts) can help organizations make these
painful decisions more appropriately.
Shades of 1984? Big
Brother Watching?
Of course, privacy advocates will not like any of the above,
and there is some validity to contrary viewpoints on the use of work email for
the purposes stated above. However, most employers today, particularly those in
areas like Financial Services, have very clear policies in place that indicate
that “no employee can have any reasonable expectation of privacy for any
communication carried out using office equipment or their office email
address”… and that they consent for this information to be logged, and records
kept for a period of seven years from the date of communication. The retention
policy may vary from industry to industry and firm to firm, but most firms
today have a data retention policy for legal reasons, so it is not altogether
unreasonable that applications such as the above will become practical in time
to come…
Besides, Google auto-reads Gmail content to pick ads to
display, so it is not entirely far-fetched for work email to be read to improve
organizational policy, especially when such determinations are made using data
in the aggregate where individual users are not identified (the same thing is
done in surveys today).
A valid argument against doing what we discuss in this paper
can be made if one asks whether people will still honestly communicate their
sentiments in email if they know they are being mined. After all, to give a
concrete example, did Goldman employees not take to writing “LDL” (Let’s
Discuss Live) when the “Fabulous Fab” and Rajat Gupta scandals broke, when they
knew their electronic communications were being tapped by law enforcement
agencies? True, but the very fact that these abbreviations show up in written
records tells us something about sentiment.
References:
[YW] https://www.yelp.com
No comments:
Post a Comment