Top 5 Reasons to Get Your Graph On in 2013
Emil Eifrem, CEO of Neo Technology summarizes the top five reasons to get into graph databases this year.
Reason #1: Performance
Relational databases have served us well enough for decades, but the model underpinning them creaks at high data volumes, especially when we join (or recursively join) several tables. Queries take longer as the total data set size increases (something we take for granted, which is largely not true when it comes to graph databases), so queries over large data sets (or involving several joins) are often infeasible for OLTP scenarios.
Other kinds of NOSQL stores work well only for simple retrievals, making them great for certain uses. However they usually break down with sophisticated queries, because they are not designed to relate data in real time. Some NOSQL data stores, such as Hadoop, are able to relate large amounts of data in batch by processing (sub)sets of bulk data, typically with a map/reduce framework. This adds latency that is incompatible with OLTP workloads, and real-time demands.
Graph databases don’t suffer the same join penalty as relational databases (which require computing relationships versus following them), or the same compute penalty as aggregate stores (which require joins to occur either in the application, or though bulk map-reduce operations). Graph databases are query-oriented rather than compute-oriented. Under the hood, the database finds a starting point in the graph, and walks the graph by following relationships through pointers. Not only does this lead to incredibly fast results, but it leads to unparalleled scaling characteristics. No matter how much data is in my database, the same query will take roughly the same amount of time, because the query time is proportional to result size rather than data size.
In a typical social graph scenario – finding friends of friends at any reasonable depth — graph databases will be orders of magnitude faster than competing approaches, enabling real-time queries within the clickstream for timely, high fidelity, high-value interactions. The same applies to risk analysis, network failure analysis, route optimization over a network, and so on.
Reason #2: Agility
The data in our databases tends to support many systems and business processes over its lifetime, each of which have a tendency to evolve. Graphs – which allow for rich data structure without the constraints of schema- are naturally extensible and amenable to continuous data evolution. In fact since the kinds of updates that extend the capability of the business are usually additive, new structures can be merged into the graph without disturbing existing applications. Older queries typically just ignore the new structures whose relationships they aren’t programmed to understand. Correspondingly this provides data model agility that far exceeds the capabilities of relational model where sophisticated schema migrations are the order of the day whenever any significant changes to applications are involved.
In turn, data model agility provides a further boost to agility at the business process tier. While the data that powers organizations can evolve without undue friction, the decision support processes that consume that data, can themselves be represented as dependency graphs, and can be smoothly evolved and versioned.
Business processes modeled using graphs (an obviously natural paradigm in itself) can be re-engineered and even formally verified quickly. Business agility is boosted as organizations can respond to process change and management in a structured and diligent manner, yet do so rapidly and repeatably.
Reason #3: Simplicity
Data modeling in a graph is simplicity itself. Simplicity in modeling with graphs comes from the fact that graph databases are whiteboard-friendly. Once we have a model that reads well on a whiteboard, it’s typically the same model that we persist into the database. This means that the semantic gap between domain experts and data stored is very much reduced#.
For example, imagine if you could take the entire diagram of your data center and cloud deployments and begin to ask questions of that model – where are the single points of failure? Which customers would suffer loss of service if a particular switch failed? What must have been misconfigured for these two different sets of customers reporting loss of service?
This kind of network analysis is readily implemented in graph databases, providing rapid and scalable (carrier-grade) network infrastructure modeling and management. Modeling a network is as simple as joining nodes and relationships and discovering single points of failure. Finding a common ancestor for incident reports is as simple as following the lines back to their source.
The graph model is accessible to regular business users too. Technicians that fix actual faults, or architects who plan future capacity and rollout, or route planners who optimize logistics networks, can easily understand and evolve the model while maintaining structural integrity of the graph and the corresponding infrastructure.
Reason #4: Dependability
Modern graph databases offer mature features for building dependable systems. For example Neo4j – the leading graph database – is ACID transactional, and offers clustering for high availability and throughput for mission-critical applications. Such configurations close the gap between OLAP and OLTP since semantically rich graph data is accessible not only to online systems, but also for running long-running analysis work, simply by attaching read-only OLAP slaves to the cluster.
This is a disruptive moment in business intelligence, that the same highly available, highly dependable, low latency systems are becoming the bedrock for real time business intelligence. No longer do we push business intelligence into a backwater OLAP system, we bring it directly into the clickstream with analytics and online intelligence merging into the same graph. And since both online and analytic intelligence are critical to business success, we extend the same level of systems dependability and data insight to both use cases#.
Reason #5: Insight
While many users of graph databases are using them purely for reasons 1-4 above, some users are after deeper levels of connected insight that leverage the very rich mathematical discipline of graph theory. The graph data model is backed by centuries of work in graph theory, discrete mathematics and disciplines dealing in behavioral analysis (sociology, psychology, anthropology, etc).
This semantically rich, connected data can, of course, be used run classical graph algorithms – like path finding, depth and breadth first search – very rapidly. But it’s also amenable to predictive analysis using techniques from graph theory. For example, given an organizational structure, graph theory allows us to find experts, communication chokepoints, and unstable management structures all by applying a few simple principles (e.g. triadic closure, local bridges, balance) to our data.
And this is made more powerful since those principles have well-known predictive qualities that allow us to model “what-if” kinds of scenarios and see how our organization (or data center network, or trade positions) evolve as we make experimental changes.
Gaining predictive insight using graph theory is particularly straightforward to implement once data is stored and processed in graph databases.