Wednesday, June 9, 2010

Google Caffeine Is Live (But Don't Panic, It's Cool!)

Today the official Google Blog and other official blogs announced the cross posted news, that Google Caffeine is live, the new Google index and indexing mechanism. According to the blog post the main change is the speed by which the web is indexed. This advancement primarily concerns the distribution of news today, which happens in realtime. The update corresponds to this demand and the expectation on the side of publishers to be indexed in time.

If you search Google on Tuesday, you may notice that the information you're looking for is a bit "fresher" than it would have been on Monday.

That's because the world's most popular search engine has unveiled a new search method called "Caffeine," which claims to index new information 50 percent faster than Google's old search.

"Caffeine provides 50 percent fresher results for Web searches than our last index, and it's the largest collection of Web content we've offered," the company says in a news release on its official blog. "Whether it's a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before."

That doesn't mean Google has changed its search formula entirely, or that search results will pop onto your screen faster than before. Essentially it means that Google is able to find new content more quickly. So, for instance, a new Twitter update that, in the past, would be been missing from search results because Google hadn't found and indexed it yet, would be posted to Google search results more quickly with Caffeine.

Here's a promotional video from Google that explains how the search works.

To keep up with the evolution of the web and to meet rising user expectations, google built Caffeine. The image below illustrates how old indexing system worked compared to Caffeine:


To better understand how Caffeine works, it might help to think of Caffeine as a blog and the old Google as a newspaper. Where a newspaper collects content and then publishes it all at once, at the beginning of the day, a blog is constantly looking for new information and updating on the fly. This is sort of how Google Caffeine works. Rather than collecting big "batches" of Web pages to index for its search, Google is trying to publish more frequently as it goes.

"Every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second," Google says.

"Content on the Web is blossoming," she writes. "It's growing not just in size and numbers but with the advent of video, images, news and real-time updates, the average Web page is richer and more complex.

"In addition, people's expectations for search are higher than they used to be. Searchers want to find the latest relevant content and publishers expect to be found the instant they publish."

"It's interesting to see that Google is focusing again on the element of its offering where it does lead the pack: search," he writes. "That's what made its [Google's] name, but it's clear that even if Microsoft's Bing hasn't (yet?) won the market share, it has got Google thinking about how it can improve what it does."

I’ve not seen as many people notice another feature of Caffeine that may be the most “actionable” form an optimization perspective. Google now has more ability to associate data about any particular piece of content they index. They are explicitly telling us that they are building capacity into their algorithm to reference more indication of the quality or importance of a document. Also note that a document might not refer to just a web page, it could be a video or other content.

What does that tell us? Yes, your content will get retrieved and indexed more rapidly than ever before. But you also need to make sure that Google gets as many signals as possible that it is worthy of attention. Links to it, reviews of it, “Likes”, tweets and any bit of information that Google might be able to pick up are more important than ever. This has been fundamentally true for some time, but the number and nature of these signals of quality are only going to increase.

No comments: