‘If you want to know what the large-scale, high-performance data processing infrastructure of the future looks like, my advice would be to read the Google research papers that are coming out right now.’  — Mike Olson, CEO Cloudera

Since the rise of Hadoop, Google has published three particularly interesting papers on the infrastructure that underpins its massive web operation. One details Caffeine, the software platform that builds the index for Google’s web search engine. Another shows off Pregel, a “graph database” designed to map the relationships between vast amounts of online information. But the most intriguing paper is the one that describes a tool called Dremel.

Read the full article.