Caffeine is a complete redesign of Google’s search index. While they reign as the #1 search engine, they have never been known to stop innovating. The web has changed and we have come to expect things to happen immediately. When we search, we want the most relevant and freshest results. When we publish things on the web, we want them to show up right now. Google has taken steps to make their search more realtime by integrating with the social web, but they have decided to go much further than that.
The results you see when you search Google come from their search index, which is a representation of the web as they see it stored in a database somewhere. Every so often, Google’s web crawlers go out and try to figure out what has changed. The web is now a rich ecosystem of data, going way beyond simple text and images, so this strategy doesn’t quite fit. It’s also slow to update. If you’ve ever published something and then waited and waited for it to show up in Google, you know how annoying that can be.
Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.
Caffeine is a fresh take on an old idea and should go a long way in keeping Google at the top of their search game. It’s also an investment in the future, providing a strong, scalable, and faster base for future developments in search. It seems that this is just the beginning. This back-end change should bring users some nice benefits on the front-end, making search more useful and relevant in the months ahead.