Wednesday, June 20, 2012

Reverse Inboxes

In this post, the author examines his social network by analyzing his email. Email headers - when both the sent and inbox folders are considered together, give strong indications of who sends email that is inbound, and who one sends email out to. This, when taken together with the number of times a particular unique identifier appears in the To: or the CC: or the BCC: fields of the email messages - perhaps entries in these three different types of fields are weighted differently - gives a pretty good indication of the strength of one's professional or personal relationships with other people.

We can fine-tune the mechanical classification described in the earlier paragraph a little bit further if we were to perform either sentiment analysis or some other text data mining in order to determine if a. the content is of a personal or a professional nature, and b. whether the email document is positive, negative or neutral in terms of expressed sentiment. These things together can be used to construct a network or graph of the relationships between the various people in one's communications sphere.

One can, if one performs this exercise with various people's email boxes, also determine if one person's network is more dense and more completely connected than another's and whether the span or expanse of one person's social network is greater than another's. This might also permit us to differentiate between people's personal and professional networks, and the set of people (or nodes) that straddle the two domains.

We posit that it is possible to have a nodding acquaintance with lots of people, and know some people really well - it's hard to know many people really well - and this is something we can confirm by performing this "reverse inbox analysis" for various people's inboxes.

To perform this analysis effectively while maintaining the author's privacy, rather than present the email addresses of individual users from the author's mailbox, an MD5 hash of the email address concatenated with a fixed random string of data is used instead. This hash value is repeatable but randomized, so provides an individualized marker by email address though the identity embedded in the email address is not visible.

If one were to compute the reverse inbox based social networks for different users, and then connect them together, a very complete view of the organization's or community's social network as a whole would emerge. We discuss this in a related post on "Mining your Social Network". We content ourselves with providing a simple reverse inbox implementation in this post.

[code to follow shortly]

No comments:

Post a Comment