Derrick Harris talks about Facebook’s graph processing platform that is built on top of Hadoop



People following the open source Giraph project likely know that Facebook was experimenting with it, and on Wednesday the company detailed just how heavily it’s leaning on Giraph. Facebook scaled it to handle trillions of connections among users and their behavior, as the core of its Open Graph tool.

Oh, and now anyone can download Giraph, which is an Apache Software Foundation project, with Facebook’s improvements baked in.

Graphs, you might recall from our earlier coverage, are the new hotness in the big data world. Graph-processing engines and graph databases use a system of nodes (e.g., Facebook users, their Likes and their interests) and edges (e.g., the connections between all of them) in order to analyze the relationships among groups of people, places and things.

Giraph is an open source take on Pregel, the graph-processing platform that powers Google PageRank, among other things. The National Security Agency has its own graph-processing platform capable of analyzing an astounding 70 trillion edges, if not more by now. Twitter has a an open-source platform called Cassovary that could handle billions of edges as of March 2012.

Even though it’s not using a specially built graph-processing engines, Pinterest utilizes a graph data architecture as a way of keeping track who and what its users are following.

There are several other popular open source graph projects, as well, including commercially backed ones such as Neo4j and GraphLab.

What makes Giraph particularly interesting is that it’s built to take advantage of Hadoop, the big data platform already in place at countless companies, and nowhere at a larger scale than at Facebook. This, Facebook engineer and Giraph contributor Avery Ching wrote in his blog post explaining how the company’s Giraph engineering effort, was among the big reasons for choosing it over alternative platforms.

Read the Full Article Here.