We’re Witnessing the Rise of the Graph in Big Data
Graph databases and graph-processing applications have been popping up all over the place lately, and now they’re starting to go commercial. On Tuesday, popular open source project GraphLab joined the ranks of graph startups.
GraphLab, a popular open source project dedicated to graph analysis and machine learning, is trying to capitalize on the excitement around graphs by spinning off a commercial entity, GraphLab Inc. GraphLab creator — and University of Washington machine learning professor — Carlos Guestrin will lead the new Seattle-based company, which has raised $6.75 million from Madrona Venture Group and NEA.
Graph analysis is among the hottest techniques around for making sense of large datasets, primarily by determining how tightly different data points are related or how similar they are. The term “graph” came into the broader lexicon along with social networks, which built social graphs to assess the relationships among their millions of users, but the technique has much broader uses.
Guestrin said GraphLab’s algorithms are used in a lot of recommender systems, but he also cites fraud detection in banking networks and intrusion detection in computer networks as potential applications. We’ve covered graphs as the analytical model of choice for everything from content recommendation to tracking lab work in genomics. Really, though — especially when combined with machine learning — graph analysis can be applied to anything where there’s too much data for a person to possibly analyze the relationships between every point.
Google also famously uses a graph-processing system called Pregel as part of PageRank. Although a number of graph databases and other projects have popped up in the past few years, Guestrin said GraphLab is actually a contemporary of Pregel. He and some colleagues at Carnegie Mellon built a small system for their lab about five years ago, then released it into the open-source world with few expectations that it would catch on. Now, he added, Pandora and WalmartLabs are among the project’s user base.
Among those other projects are graph databases such as Giraph (an open source, Hadoop-based Pregel clone developed at Facebook) and Neo4j (which also has a commercial arm, called Neo Technology), as well as Twitter’s Cassovary and fellow University of Washington project Grappa. Guestrin said GraphLab can work with most of them, particularly if they’re not designed to do machine learning at scale like GraphLab is. Some efforts, he noted, are focused on simply storing data in graph form (e.g., databases) or in providing simple graph analysis.
As for when we’ll actually see the results of the effort to commercialize GraphLab, Guestrin said it will be a while. Right now, he’s focused on the next open source release of GraphLab in July. However, the company will begin engaging with commercial users over the next several months to determine what types of features they would expect in commercial graph-analysis software.