Tuesday, February 7, 2012

Facebook posts Sentiment Analysis

In a previous post, we examined the utility of mining social media such as the micro-blogging site Twitter. Unrestricted, light, friendly, uncensored, and sometimes trivial and uninteresting information is shared by people with one another on such media. However, at the same time, there are some insightful posts or tweets, reporting from the site of event occurrences, from people closest to the action etc., which gives this information a certain relevance, an immediacy, a currency, an accuracy, and even a certain un-pretentiousness that comes from it being delivered by ordinary people who want to get their voice out there, get heard, share a view etc., which makes social media so fascinating.

People don't usually care that Alice had chicken for dinner. Perhaps even Alice's closest friends do not. But people do care what someone - say a civilian - on the ground in a war-zone has to say about what she has seen, is experiencing, along with video footage where available, to break a news-story to the broader world. In this latter sense, social media's contribution to the world at large is immense.

In this post, we look at legally mining data from that other large social media engine, Facebook. As of this writing, Facebook has ~800M subscribers, which makes it the size of a medium size to large country if it were a geographic agglomeration of people. And these people tend to interact with each other, some quite frequently, and do so by posting messages, sharing photographs, using applications, "liking", "poking" and doing other Facebook specific actions, all with a view to having fun, "keeping in touch", and generally having a good time being social.

Mining Facebook data can be done in two ways:
1. Searching through Facebook public domain posts and messages. This does not require one to log into Facebook but can be achieved by merely using the published public domain Facebook API to access data that many Facebook users do not necessarily even know they have placed into public domain, though lately at least part of the user population seems to be growing up to data ownership and privacy issues.
2. Searching Facebook posts not publicly accessible - this requires that one log in, but provides a deeper access to the Facebook "graph" that connects various Facebook objects including messages, groups, pictures, links etc. all together into a structure that can be queried via a REST-ful API.

We implement a very simple version of (1) in the code below. Again, we note that mining data here can have various practical applications, such as for example, performing sentiment analysis of the user population towards world or local events e.g. the crisis in Europe or Greece, the Arab Spring, elections in the US, the price of oil, Madonna's performance during the Super-Bowl etc., Some sentiment analysis can even be used to help with marketing of products and services and even for applications like investment management. There have been news-stories of how sentiment analysis was used to predict the direction of Netflix stock.

Our implementation stops short of performing the actual sentiment analysis because we have already implemented simple sentiment analysis in our earlier Twitter post. The same approach can be replicated here by the interested reader with minimal effort. Several additional enhancements can also be made if (2) above is used to determine how "connected" a post-writer is to the rest of the Facebook graph (we can do this in the Twitter context using the number of followers one might have). Facebook also offers additional media like pictures that can be also used to add additional context to the story.

One issue we face while performing sentiment analysis in the Facebook context that we do not see with Twitter comes from the fact that Facebook posts tend to be longer on average and are not limited to the 140 characters of a typical Twitter micro-blog tweet. This means that even our simple sentiment analysis algorithm needs to be tweaked to compute the overall sentiment of a post by calculating the relative percentage of positive, negative, and neutral sentiment key-words, and then interpret these in the larger context of the post. Similarly an additional hurdle we face here is that larger posts offer greater capacity for one to express his/her creativity, and that might mean there are more posts that are sarcastic, satirical etc in nature, and text-based analysis unless very sophisticated, is likely to miss these nuances in meaning, making things more difficult from a classification stand-point.

Code for simple Facebook data-mining is posted below, for a sample query with the key-words "quantitative easing" and the generated output file. Enjoy!

Some other issues with the code (an optimist might have titled this section "Future Work" or "Next Steps"):

  1. This is a very simple, unpretentious implementation focused on the core issue of mining Facebook posts for data and parsing the results into a human-readable, usable input form for sentiment analysis. It can easily be made much more sophisticated, but we just hit the highlights here and move on to other things.
  2. I did not build in a processor for special characters in the larger unicode data set. So these appear as noise in the output.
  3. I do not check for messages that repeat and filter them out, with good reason. Sometimes messages with minimal changes are re-posted by other people, sometimes with, and sometimes without attribution to the original post. I guess the general rules about plagiarism vs. original thought do not apply as much to social media. 


Sample Source Code:

import os, sys, urllib2; # include standard libraries of interest


# function that takes two lists as input, reads them,
# then returns a list of tuples (x,y) where the x is from the first
# list, and the second element y is the smallest number larger than x
# from the second list. we use this as a helper function to parse the
# output of the query to Facebook.
def L(a,b):  
 r=[];
 for i in a: 
  t=[j for j in b if j>i+2][0];
  r+=[(i+2,t)];
 return r;


# program sample usage is "python fbmine.py "quantitative easing" qe2.txt
# here fbmine.py is this source code file
# "quantitative easing" is the string of space separated keywords we mine for
# qe2.txt is the output file generated by this data-mining exercise.
wrds=sys.argv[1]; # wrds is the string of words we want to filter for
wrds=wrds.split(" ");
s="";
for i in wrds[:-1]: s+=i+"%20"; # populate the query 


s+=wrds[-1];
#print "query: http://graph.facebook.com/search?q="+s+"&type=post&limit=1000\n\n"; # create the query string and launch it
req=urllib2.Request("http://graph.facebook.com/search? q="+s+"&type=post&limit=1000");
response=urllib2.urlopen(req); # collect the query results
txt=response.read();
txt=txt.replace("\\n","").replace("\\",""); # some simple cleanup of read data


p=txt.split("\"");
m1=[i for i in range(len(p)) if p[i]=="message"]; # parsing the messages
m2=[i for i in range(len(p)) if p[i]==","];
R=L(m1,m2); # using the helper function
g=open(sys.argv[2],"w"); # generating and writing out the output
for i in R: 
 s="";
 for j in range(i[0],i[1]): s+=p[j]+" \n";
 g.write("-----------------------------------------------------------------\n");
 g.write(s+"\n");


g.write("------------------------------------------------------------------\n");


Sample Output File: (file was generated around 1815 hrs Friday Feb 10 2012)

--------------------------------------------------------------------------------
The Bank of England has announced another round of 'quantitative easing', this time printing u00a350 billion of money.Keep it up lads; at this rate soon we'll all be billionaires, just like everyone in Zimbabwe.Turns out that smashing a stake through a vampire's heart works, even if your neighbours cat's not a vampire. 


--------------------------------------------------------------------------------
The Bank of England has announced another round of 'quantitative easing', this time printing u00a350 billion of money.Keep it up lads; at this rate soon we'll all be billionaires, just like everyone in Zimbabwe. 


--------------------------------------------------------------------------------
Neat, expression, So why the blithering flip.....Very interesting article on printing money..  It's English, but applies equally here, I think:  (Be sure to read the last link in article)http://blogs.telegraph.co.uk/news/danielhannan/100136397/quantitative-easing-has-failed-and-failed-again-what-madness-has-seized-our-leaders/ 


--------------------------------------------------------------------------------
The Bank of England has announced another round of 'quantitative easing', this time printing u00a350 billion of money.Keep it up lads; at this rate soon we'll all be billionaires, just like everyone in Zimbabwe. 


--------------------------------------------------------------------------------
just saw ths funny joke....... The Bank of England has announced another round of 'quantitative easing', this time printing u00a350 billion of money.Keep it up lads; at this rate soon we'll all be billionaires, just like everyone in Zimbabwe. 


--------------------------------------------------------------------------------
http://www.zerohedge.com/news/obama-revises-cbo-deficit-forecast-predicts-110-debt-gdp-end-2013quantitative easing is not the panacea that Obama is hoping for 


--------------------------------------------------------------------------------
The system is FRAUD! 


--------------------------------------------------------------------------------
The Bank of England has announced another round of 'quantitative easing', this time printing u00a350 billion of money. Keep it up lads; at this rate soon we'll all be billionaires, just like everyone in Zimbabwe. 


--------------------------------------------------------------------------------
The bank of england has just announced another round of 'quantitative easing', this time printing u00a350 billion in notes.Keep it up lads, at this rate we'll soon all be billionaires. Just like everyone in zimbabwe. 


--------------------------------------------------------------------------------
u2018u201cQuantitative Easing is a transfer of wealth from the poor to the rich,u201d he says, u201cIt floods banks with money, which they use to pay themselves bonuses. The banks have money, and assets, so they can borrow easily. The poor guy, who is unemployed and can't borrow, is not going to benefit from it.u201d The QE process pushes asset prices up, he says, which is great for those who own stocks, shares and expensive houses. u201cBut the state is subsidising the rich. It is the top 1 per cent who benefit from Quantitative Easing, not the 99 per cent.u201du2019 -- Nassim Taleb 


--------------------------------------------------------------------------------
Quantitative easing is now more vile on the lips than any four letter word http://tgr.ph/zNpxTg 


--------------------------------------------------------------------------------
http://blogs.telegraph.co.uk/news/danielhannan/100136397/quantitative-easing-has-failed-and-failed-again-what-madness-has-seized-our-leaders/ 


--------------------------------------------------------------------------------

[...] I've truncated this to save space.