Uploaded in The Fifth Elephant 2013 - Day 3

Most data around us can be thought of as "things co-occurring with other things in certain contexts". Whether it is products co-occurring with other products in retail market baskets, words occurring before or after other words in unstructured text, tags co-occurring with other tags in social tagging systems, people co-occurring with other people in various social networking scenarios, or objects occurring in various 2-D geometrical juxtapositions of other objects in images, etc.

While there have been silos of efforts in each research community - retail, text, social networking, and vision, etc. - in dealing with "their" data, there has been no unifying framework to tame such a wide variety of co-occurrence data systematically - a theme for this session.

We will present a simple, intuitive, yet a powerful co-occurrence analytics framework to deal with a wide variety of data of the form "things co-occurring with other things in some context". After describing the framework we will demonstrate how to adapt and apply the core principles of the framework to a variety of large real-world datasets to find novel and actionable insights even in the presence of significant noise in the data.

What makes this approach attractive is that it is:

(1) Unsupervised: No cost of getting labeled data. Just point it to the data and crunch.

(2) Unbiased: No prior assumptions about data distributions, etc.

(3) High Precision: Generates very high quality insights.

(4) High Recall: Generates exhaustively many insights.

(5) Parameter Poor: Very few parameters to play with.

(6) Scaleable: Highly parallelizable in MapReduce sense.