(un)structured

Josh Payne on content analytics, enterprise content and information management

Archive for the ‘Content Analytics’ Category

The Rise of Content Analytics: A Valedictory

leave a comment »

As my colleagues inside IBM have known for a over a week, I’ve decided to leave IBM to pursue other professional opportunities. Before I put out some blog posts on my future, I wanted to use this opportunity to look backwards at how far the ECM business has come with respect to discovery and content ananlytics over the last 5 years.

I came to IBM as part of the acquisition of a small enterprise search vendor — iPhrase Technologies. We joined up with a product team inside IBM building a product called “Information Integrator OmniFind Edition” to attack the enterprise search market.  Though we were grouped inside the Content Management organization, we really went about our business independently relative to our ECM brethren, focusing on the search solutions, leveraging content analytics technologies for ‘concept searching’.

1 year later, FileNet joined IBM and we began to try to apply our search and discovery technologies to ECM centric business scenarios. As we began to collaborate, one of the first things that struck me about ECM, was the treatment of the documents. In enterprise search, documents were something to be cracked open by definition — how else to search it?

Yet the ECM world had a tendancy to treat a document as an ‘object’ –objects to be handled and managed. It struck me as digital paper shuffling where the expectation that ECM was for readying the document for someone with 2 eyes to read it and use it (and don’t get me wrong, it was challenging paper shuffling — billions of objects, large scale scanning — tough, tough problems).

Within this context we set down a path of applying analytics technologies to ECM. Our first step was to weave IBM’s content classification product within the ECM architecture, applying it to compelling scenarios in email archiving and records management. Next, we brought to market an eDiscovery solution built with analytics at its core. These first two steps were exciting but focused attempts at bringing about a better solution to specific ECM problems with content analytics, especially in the information governance market.

Then last year, IBM brought made generally availabile our Content Analytics platform. This third step is especially gratifying. Content analytics technologies have moved being an isolated technology, separate from ECM to delivering insight about businesses by leveraging the text inside of documents — the insides of these objects.

The embrace and adoption of content analytics is especially gratifying for me personally. Though I had but a small role, the change inside IBM ECM and externally amongst customers, analysts and others is stark relative to when I joined IBM.  Content is no longer simply an ‘object’ to be managed — its an asset to be leveraged and this is a striking difference. I am confident that in the coming months and years this will increasingly become the accepted attitude and approach in ECM.

On that note, I want to thank folks for reading this blog on the topics of content classification and content analytics. For folks who are interested in more writing on information lifecycle governance, Craig Rhinehart continues to write on this topic at his blog.

Since my professional life will take me away from content analytics in the near term, I expect that this blog will start to reflect the new paths I’ll be following on my professional, post-IBM journey.

I hope you’ll continue to read as my journey takes these exciting new steps.

Advertisements

Written by Josh Payne

June 16, 2010 at 1:53 pm

You Need Content Analytics to Determine the Value of Content

with one comment

I went on vacation last week.* (side note — though I’ve embraced twitter, foursquare and other modern public media platforms, I’ve yet to embrace the idea of broadcasting to the world the fact that my house was completely empty and I was 1000 miles away to the world at large – call me old fashioned if you must).

I mention it not to gloat about how much fun I had with my kids, but to bring up what I did the day before I departed. Again, call me old fashioned, but I typically get my books not from amazon, a bookstore or via an iPad, but from a more cost effective source: the public library. Quaint, I know.

When I go to the library, I can’t go without a plan. I can’t simply browse the stacks to find a good book. Yes, the library is well organized (good classifications!). And each book has good information on the cover describing the contents (standard metadata!) like author and title. But that information exterior to the contents just is not effective in helping me quickly determining the value of a book relative to my needs. I prepare in advance by reading reviews of others – other people who’ve read the books and analyzed their value. Otherwise finding a good couple of books for my vacation is an overwhelming and frustrating task.

The same idea – expending effort to analyze the long-form text inside content – applies to the content inside your organization. In previous postings I’ve discussed the value of content assessment to your organization. And to execute content assessment you need to execute content analytics. Historic approaches to tackling the content assessment problem have focused on  metadata exterior to a document – the title, the author, the dates. This is much like trying to find a library book just by browsing the stacks. Determining what content is necessary to your organization – what content is valuable, requires governance, is legally relevant – is virtually impossible simply by examining data exterior to your content.

Content analytics provides your organization the ability to determine the value of your content by interrogating the interior of those documents. Metadata on the outside of a document is only part of the story. What concepts are covered in the document? Does this document concern itself with a customer? A business partner? Does this document concern itself with a particular business activity?

All of these questions are difficult to answer without examining the text in a document but given the volume of information in your organization, it’s difficult to actually make these assessments on a large scale basis.  In my next posting I’ll cover how content analytics can help to answer these valuation questions.

Written by Josh Payne

May 7, 2010 at 7:48 am

The Value of Content Assessment

with one comment

In my previous post, the first in my series on content assessment, I described the information landscape with respect to content. Organizations are facing ever increasing volume, velocity and variety of information. Understanding growing piles of uncontrolled content through content analytics has clear benefits to organizations of every size. Each organization – and the range of stakeholders in those organizations – will benefit from engaging in content assessment. How? In three main ways:

1) There is value to all stakeholders in simply understanding content better through analytics. Dynamically analyzing silos of unmanaged, uncontrolled content via content analytics provides new insight about this information stakeholders previously did not have. Before, stakeholders simply knew the ‘speeds and feeds’ about a content repository: the number of documents, the size of those document, etc. Content analytics now delivers insight about the content and that insight leads to better, more informed decision making. Which areas represent the most risk? Where should we start our governance efforts? Where should our priorities lie? What is the projected ROI of better information lifecycle governance?

Today, organizations make these kinds of decisions about their unstructured content repositories with limited data. More likely, they avoid making decisions because they lack this kind of insight. No longer. Improved understanding and insight about your unstructured information leads to better decisions about how to take action.

2) One such action to take is to decommission content, the systems that support that content and the systems that rely upon that content. Decommissioning is primarily an IT concern. They manage the costs of the information infrastructure. By default, most organizations have been doing nothing with their content. And as such their infrastructure costs have continued to rise. With an understanding of the content, you can take on these these once avoided decisions with more confidence. By understanding the content in a particular system, you can take action to shut those systems down and save costs.

3) There is a flip side to decommissioning old content and the systems that support content. It is that by understanding content, you will be empowered to preserve the necessary content. Preserving the necessary content enables the decommissioning you want to execute.

Content assessment provides you the ability to identify content that is valuable. This makes general line of business users happy, as they are resistant to decommissioning because they don’t want you to throw away ‘something they’ll need’ in the future.

Content assessment provides you the tools to identify content that requires lifecycle governance. The compliance officers and records managers will be happy because your organization’s obligations will be met in a documented process. You will be taking steps to enforce your content policies on disposition of content while still working to control your costs.

Content assessment provides you the tool to identify content that is legally relevant. The lawyers will be happy because they can use it to find the information relevant to legal cases where it resides in uncontrolled environments – and exert the kind of control the eDiscovery process demand.

Three main ways content assessment delivers value to your organization: via understanding of you content on its own; via decommissioning and consequently reduction of IT cost; via preservation and governance for fulfilling the needs of line-of-business stakeholders and compliance minded stakeholders alike.

Next in the content assessment series . . . what content is ‘necessary’ to your organization and how does content analytics help to make this determination?

Written by Josh Payne

April 21, 2010 at 9:30 am

My College Laundry Habits and Your Organization’s Content Habits

leave a comment »

First in a series of posts on content assessment.

Not to scale . . . my college laundry piles were *much* biggerIt has been quiet around this here blog. One reason was that the month of March saw two “once in 50 year” rain storms in the Boston area. I got to learn some valuable skills in flood prevention as a result – unfortunately, those lessons came at the cost of activities like blogging and tweeting . . . but I’m back and ready to roll with a series of posts on a topic I’ve been thinking and working on over the past 3 months – content assessment.

I introduced this topic after our original announcement for our content assessment offering. And I’ve spent the last few months talking to IBM customers, analysts and other enterprise content professionals inside IBM. It’s an exciting application of content analytics technology to solve a class of problems that our customers have traditionally ignored . . . and hoped that it would go away — kind of like my laundry in college. Back then I kept on wearing my clean cloths day after day, hoping my laundry would magically wash itself. Not surprisingly, the cloths kept piling up. Finally, a random Sunday afternoon would arrive; I’d wake up, bite the bullet and wash my cloths. Ah  . . . to be 19 again . . . I digress.

Much as I continuously generated dirty cloths, organizations continue to generate content. And similar to the haphazard piles of laundry in my dorm room, these chaotic uncontrolled piles of content aren’t cleaning up themselves. And these piles of content are growing at a much faster pace.

In college, I’d wait until I couldn’t stand it anymore. And then I’d take action to take control of my clothing situation.  With the velocity, volume and variety of content growth, organizations are hitting a similar stage. They can’t maintain the same ‘do nothing, save everything’ practices about the content. The day has arrived to tackle those piles.

To IT, the costs are continuing to rise upwards (17% of IT budgets are devoted to storage alone, up from 10% just a few years ago). Records managers increasingly realize they can’t rely on users to identify and control business records. Legal needs to find the documents they need for eDiscovery proceedings – and fast.  Line of business users need better access and control of trusted content to better execute their business activities.

These information stake holders need better control over the necessary information for their business. But to take action to exert that control they need better understanding of their content landscape. They see the mounds of content, as far as their virtual eye can see. Years of bad content habits have created an intimidating problem that leaves them paralyzed as to how to solve it.

Content assessment solutions – powered by innovations in content analytics – are now ready to meet this challenge. Content assessment solutions deliver the kind of understanding organizations need to make decisions about their content. Empowered with insight about their content via content analytics, organizations can now take action. They can take action by decommissioning the content they no longer need. They can take action by decommissioning the systems and infrastructure that supports their unnecessary content. And they will be willing to take these cost cutting actions because they’ve identified and preserved the content that is necessary to their organization.

In the coming days and weeks, I’ll post more in this series of posts on content assessment – covering in more detail who benefits from content assessment, what those benefits are, and the key elements to a content assessment solution. Its an exciting new solution area.

You can’t avoid the grappling with the piles of content . . . just as I couldn’t avoid doing laundry.  If your content governance practices are analogous to my college laundry habits, content assessment is an idea you need to learn more about.

Written by Josh Payne

April 15, 2010 at 3:57 pm

Mistakes Made By People are Forgivable

leave a comment »

In a previous post, I emphasized the importance of rigorous, controlled testing when assessing the potential of content analytics. This is especially important for content classification when it is being used to replace human decision-making.  My broader point  in that post was that when adopting new technology, you can’t rely on the qualititative perception of the skeptical observer.

A similar topic, that of adoption of technology in the legal profession, came up at the keynote to LegalTech last week. Law.com recounts Dr. Lisa Sanders’ response, which was far more eloquent than my post so I wanted to pass it along here:

During the question-and-answer session, Kelley Drye & Warren Practice Development Manager Jennifer Topper asked the panel how to convince litigators to use tools like technology and decision trees, repeatable processes that can help make handling similar cases more efficient.

It’s a long process of changing attitudes within a corporate culture, Dr. Sanders said. “Mistakes made by a computer or guideline live forever in the minds of people watching them. Mistakes made by people are forgivable.”

I guess that’s why she writes for the New York Times . . .

(the LegalTech keynote has been quite the blogging gift this week. Maybe I should volunteer to staff the IBM booth next year and get the scoop first hand)

Written by Josh Payne

February 10, 2010 at 9:52 pm

Dehumanizing Human Analysis

with one comment

I read up on some of the goings-on at Legaltech in New York city last week. A couple of things caught my eye from the write-up on legalcurrent.com.

1) I found it interesting, as I tweeted earlier in the day, that David Craig of Thomson-Reuters used the term “Tsunami of information”. We currently host a whitepaper from Cohasset Associates entitled “Meet the Content Tsunami Head On: Leveraging Classification for Compliant Information Management.” It will be interesting to see if that descriptor gains traction in the marketplace.

2) Malcolm Gladwell is hitting the information management circuit, isn’t he? First IOD last fall, now Legaltech. (I hope he continues it; he was hands down the most interesting keynote speaker I’ve seen at a tradeshow. Effective in tying his storytelling back to the themes of the show itself).

3) Lastly, Gladwell, as recounted in the writeup, referenced a story about the chess master Gary Kasparov:

Gladwell pointed to a Kasparov chess challenge in which both opponents used a computer throughout the match. Kasparov saw that the computer’s quick analysis of every possible move enabled these grandmasters to let their experience, creativity and knowledge come through.

That’s a nice summary of the core of my for content classificaiton specifically, and content analytics more broadly within the context of information governance.

One way to read that quote is that content classification frees up the mind of our knowledge workers such that they can focus on the truly complex matters and truly human endeavors that require our most valuable skills. Leave the mundane grunt work to the computers, automated, in the background.

Seen differently, when computers automatically intelligently provide the top, best choices for humans — assist in classification of informationwithout completely automating the task — humans are left to focus their brain power for classification to explicitly focus on the finer points of thedecision making process, and as such come to better conclusions.

Either way, I thought an interesting view on the role of automated analysis in relation to typically human based decision making.  Dehumanizing the analysis lead to better, more humane results.

Written by Josh Payne

February 8, 2010 at 11:42 pm

Shattering Your Implicit Cost Assumptions for Information Governance

with one comment

I’m quite enamored with twitter. Its my main source of information and news, especially on the weekends as I’m rarely in front of a computer and it delivers interesting tidbits to the blackberry in my pocket. And its certainly the best way I’ve found to keep my finger on the pulse of goings on in the niche relevant to my professional interests: ECM, information governance and records management.

This tweet, from @MimiDionne a couple of weekends ago caught my eye:

My initial response: Mimi, this is no joke! My 18 month old wasn’t that interested in hearing about cost savings and information governance at the time, so I returned to our conversation about ducks and birds and swallowed my observations until now.

Its always perilous to read too much into 140 character long observations, but my instant reaction to the tone of the tweet was that she was embarking on something that the general records management community would view as quixotic. Mimi is probably with me on the potential for cost savings, but the rest of the community is probably not, as reflected in her joking tone.

Information governance initiatives, and that very much includes a records management, can indeed be better for your budget. My friends at IBM who are focused exclusively on our records management product have been helping our customers calculate the ROI with the “No Paper Weight” initiative for the past few years. But when I view information governance through the lens of content analytics, I see even greater possibilities for easing your budget.

One of the key values of content analytics technology is that it is a substitute for human analysis of documents. And because those documents have become so numerous in our organizations, the cost to analyze them has become very high. The inability to execute analysis of your documents becomes an implicit assumption in how you plan your records management and information governance projects.

By adopting content analytics as a core element of your information lifecycle governance strategy, you can shatter this assumption and doing so reshapes your budgeting in two critical ways, cost prevention and cost reduction:

1) Document by document decision making for cost prevention. Technologies like automated content classification can augment, improve efficiency of, or outright replace document by document decision making. By using content classification and other analytics approaches to better automate your governance decisions, you’ll improve your organization’s productivity by letting the general population of users focus on their ‘real’ work and leave records management decision making out of their lives. More productive workers means a positive impact on your budget.

2) Content Decommissioning for cost reduction. More compelling, is the savings possible by gaining control and governing the lifecycle of your important information, and decommissioning the rest.  Too frequently, we hear from customers that they keep their information active in its original store because they don’t know what’s important and it would be too difficult (i.e. involve a costly analysis of the documents) to figure out what to preserve.  But the cost implications of retiring the systems that store this content is compelling. If we can reduce the barriers that prevent organizations from sifting through the information and picking out the relevant, valuable content (the content that a records manager would say is a record), then we can unlock the budget friedly implications of better information lifecycle governance. Content analytics does that.

Once you’ve assessed legacy content stores and picked out and preserved the valuable content, you have opened up new worlds of cost saving:

– File system storage, which assumes high availablity and rapid access, can be cut. Content is decommissioned and your disk purchasing budget for the next year can go down. You simply don’t need as much as before.

– Administration costs. Less storage means less cost to administer that disk. Lower power costs, cooling costs and of course human costs. Most organizations model a fully burden cost for storage. Decommission your content and you can decommission much of your fully burdened, ongoing cost.

– Further, the information you do keep can be sent off to lower cost storage as typically that information will be determined to require less frequent access than actively generated content.

– Application maintainenace is also an implication of legacy, uncontrolled content. Content is frequently not stored in totally uncontrolled ways (file systems) but rather in partially controlled places as part of software business applications (like a CRM system or a knowledge management application). And those business applications come with ongoing maintenance costs. Sometimes these costs come in the form of explicit maintenance fees from software vendors. Sometimes they are billings from IT organizations for the human cost of maintaining the application. But if you’re able to identify the important information inside those applications, preserve it, apply lifecycle governance to it and decommission the complete, originating application, you’re certain to cut costs.

Yes, lifecycle governance can be expensive. But there are savings to be had once you break down the implicit hurdle that understanding your information can be done cost effectively with content analytics.

Written by Josh Payne

February 8, 2010 at 10:03 am