Graph Theory: Key to Understanding Big Data
Article originally posted to Wired.com by Emil Eifrem.
Dr. Roy Marsten wrote in March that Graph Theory was a key approach in understanding and leveraging big data. As a advocate of graph theory and as a developer building graph databases since 2003, it was wonderful to read someone else with similar insights and appetites.
As Dr. Marsten notes, Google started the graph analysis trend in the modern era using links between documents on the Web to understand their semantic context. As a result, Google produced a Web search engine that massively outperformed its established competitors and saw it jump so far ahead that “to Google” became a verb. Of course we know very well Google’s history since then: its graph-centric approach has seen it deliver innovation at scale and dominate not only in its core search market, but also across the information management space.
Graphs for Everyone
But graphs aren’t just for the likes of Google with virtually limitless funds and armies of Ph.D.’s at their disposal. While Google and its competitors might be content to build their own graph data infrastructure, that technology is also available off the shelf to the rest of us.
For example, the Neo4j project is a mature open-source graph database used in production at all kinds of organizations from Global 2000s like Walmart, Lufthansa, and Cisco, to innovative startups like FiftyThree, Medium, and CrunchBase. Graph databases like Neo4j have risen to prominence recently, and as 451 Research analyst Matt Aslett recently observed, are moving out of the general NOSQL umbrella into a category in their own right. They have become popular because like Dr. Marsten, many thousands of other software and data professionals have seen that graphs are the best way of storing and querying their increasingly complex interconnected data.
Web search isn’t the only domain where graphs have provided amazing competitive advantage. We’re all well aware of how Facebook and Twitter have used the social graph to dominate their markets, and how Facebook and Google are now using their Graph Search and Knowledge Graph respectively to gear up for the next wave of hyper-accurate and hyper-personal recommendations, but graphs are becoming very widely deployed in a host of other industries. Gartner points to consumer web as another industry where graphs have played a critical behind-the-scenes role in determining competitive outcomes, in a paper whose name says it all: “The Competitive Dynamics of the Consumer Web: Five Graphs Deliver a Sustainable Advantage.”
One concrete example of graph databases being used outside of search is eBay, who (owing to a recent acquisition of Shutl) provides a service that uses graphs to compute fast, localized door-to-door delivery of goods between buyers and sellers, scaling their business to include the supply chain. Incidentally, eBay observed that before turning to graphs the latency of their longest query was higher than their shortest physical delivery, both around 15 minutes — something that can’t now be replicated when an average query is powered by a graph database and takes 1/50th of a second!
The eBay example is not isolated. Organizations large and small are adopting and winning with graphs in retail, finance, telecoms, IT, gaming, real estate, healthcare, science, and dozens more areas. It’s an existential proof that Dr. Marsten’s hunch about the power of graphs is absolutely true.