Josh Payne on content analytics, enterprise content and information management

Google’s Improvements Extend into Information Governance

leave a comment »

I just read with great interest Steven Levy’s article in Wired on Google’s search algorithm and how Google works to improve it. A couple of things leaped out at me as concepts I’ve discussed here in the past (or on my old blog), as the concepts extend into the enterprise. Just as Google uses them to improve their consumer search experience, you can leverage them within the context of better information governance.

1) Google uses document context similar to how I have describe advanced content classification as a “context-based” method of classifying information.  Levy writes:

Google’s synonym system understood that a dog was similar to a puppy and that boiling water was hot. But it also concluded that a hot dog was the same as a boiling puppy. The problem was fixed in late 2002 by a breakthrough based on philosopher Ludwig Wittgenstein’s theoriesabout how words are defined by context. As Google crawled and archived billions of documents and Web pages, it analyzed what words were close to each other. “Hot dog” would be found in searches that also contained “bread” and “mustard” and “baseball games” — not poached pooches. That helped the algorithm understand what “hot dog” — and millions of other terms — meant. “Today, if you type ‘Gandhi bio,’ we know that bio means biography,” Singhal says. “And if you type ‘bio warfare,’ it means biological.”

Google uses the context of the content it indexes to better understand the purpose and intent of a particular document and in turn the purpose and intent of your particular search query. Advanced content classification methods deliver better categorization results in a similar way — it is using the full context of the training documents provided to it to better results.

2) When discussing ‘trusted content’, I used the example of how Google trusts some sources over others. At the time, I didn’t have a source for this assertion. Levy describes this in some detail in the article:

That same year, an engineer namedKrishna Bharat, figuring that links from recognized authorities should carry more weight, devised a powerful signal that confers extra credibility to references from experts’ sites. (It would become Google’s first patent.) The most recent major change, codenamed Caffeine, revamped the entire indexing system to make it even easier for engineers to add signals.

Do read the entire article if you’re interested in these topics — given our universal reliance on Google as consumers, its certainly beneficial to be an educated consumer. And these concepts can extent into better proactive management of your enterprise content.


Written by Josh Payne

February 23, 2010 at 10:10 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: