(un)structured

Josh Payne on content analytics, enterprise content and information management

Eyeballing Your Content Analytics Results

with 2 comments

One of the basic tools of carpentry is a level. Without it, a carpenter (or any weekend Mr. Fixit) is left eyeballing whether or not a particular surface is perfectly horizontal or perfectly vertical. Sure, anyone can ‘eyeball’ the work and say “Yeah, it looks straight to me”, but more often than not, that results in a slanted bookshelf.

Through experience, we learn in carpentry that eyeballing it just isn’t an accurate method — our perception of reality can be misleading.

The same rigor needs to carry through to how we judge the success of content analytics and specifically content classification.  Content analytics, like any innovation, is the target of skepticism and misperception. This has been one of the challenges we’ve faced up as we push for adoption of our content classification product at IBM, and other products that leverage content analytics.

I bring this up having read an interesting post by Lexalytics, a content analytics vendor, on their investigations of their sentiment analysis accuracy.  One of the things that caught my eye in the post was a quote from Forrester analyst Suresh Vital who said “in talking to clients who have deployed some form of sentiment analysis, accuracy rests at about 50 percent.”

Its a pretty casual quote that reflects how too many approach their assessment of accuracy of content analytics. There’s no reference to rigorous studies. There’s no reference to hard data about the success or failure of content analytics. It is “Oh, I’ve spoken to a bunch of customers and they perceive it to be doing an iffy job.”

This is the wrong way to judge content analytics.

Last year, as our customers were going about their buying decisions on content classification, many would go through limited tests of our content classification capability. There were two types of these content classification “proof of technologies” that went on.

The first were the rigorous tests. The ones who followed best practices. They created a reasonably large corpus of pre-categorized documents, and segregated a large portion of this pre-categorized content as a test set. The remaining content was used to train the system. To assess the accuracy of the system, the segrated, pre-categorized test corpus was used. A human was not judging the system document by document as it categorized. Rather, a large, statistically valid sample was run through — a formal control set.

The other type of customer did the opposite. They trained the system and then pulled uncategorized content and asked a human and the Classification Module to categorize side by side.

The results, in terms of accuracy, were the same for both types of customers. About 80% of the top category response was correct for all customers.

The perception of the accuracy for the different types of customers was starkly different.

Those who followed a rigorous approach perceive the automatic classification process to be a success. Those who followed the ‘judge by hand’ approach perceived the system to be unreliable. Why? The human judges have a tendancy to latch onto the failures — the misfires are far more memorable in the eyes of the judge than the successes. The misfires are just numerous enought (10-20%) that they seem pervasive. In reality, the vast majority of the results are good.

This is why the Classification Module and its Classification Workbench tool itself has explicit workflows built into it for executing rigorous testing of your training set and potential categorization process. Because eyeballing it leads to the misperception of results — and crooked bookshelves.

Advertisements

Written by Josh Payne

January 26, 2010 at 10:18 am

2 Responses

Subscribe to comments with RSS.

  1. Great post! Always glad to see more and more work done in how we perceive and account for accuracy.
    Cheers,
    Christine
    Lexalytics, Inc
    @christinelexa

    Christine Sierra

    January 26, 2010 at 11:04 am

  2. […] a comment » In a previous post, I emphasized the importance of rigorous, controlled testing when assessing the potential of […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: