Neo4j and Stackato
In this article John Wetherill talks about Neo4j, created by Neo Technology, with a focus on how Neo4j fits into the cloud, using Stackato as the foundation
Pivotol’s awesome CF Platform Cloud Foundry conference just finished, and featured many energetic and page-turning (so to speak) sessions by powerful industry thought leaders including Warner Music CTO Jonathan Murray. As one can imagine, Warner Music keeps a good amount of data (for example, their entire catalog at high fidelity), and as a result, are subject to many performance, scalability, security, and reliability constraints. Given these requirements, a natural, almost instinctual reaction, is to reach for an RDBMS. But in his talk Jonathan made it emphatically clear that for all of their recent and greenfield app development, they could not come up with a single use-case that required a relational database.
Not one. Hmmmm.
He also had some strong words to say about stored procedures.
These are powerful statements, and a few years ago they would have seemed ludicrous. But not any more. Database technology has advanced fast, resulting in a vast palette of mind-numbingly powerful big data or NoSQL databases and datastore products, many of which are instantly and freely available. Several of these surpass traditional RDBMS systems in performance, and have other considerable advantages.
This article will describe Neo4j, a first-class big data “graph” database by Neo Technology. Neo4j has several exciting features that lend credence to Jonathan’s claims. The main focus here will be on how Neo4j fits into the cloud, using Stackato as the foundation.
Graph Databases and BigData
Relations are an important feature of RDBMS systems that give them much of their power. In contrast, several big data offerings have little explicit support for relations, including most document stores and key-value stores such as MongoDB and CouchDB.
But Neo4j is an exception: it is as relational as any RDBMS I’ve ever used, and arguably considerably more so. Graph databases place importance on the relationships between the data, not just the data itself. This statement can be justified by the fact that while RDBMS systems can certainly represent and efficiently and traverse relationships, the mechanism that do so (foreign keys, joins, join tables, highly optimized indexes, etc.) are not really “primitive” to these systems like tables, rows and columns are.
But graph databases are made up of nodes (aka vertices) and relationships. In other words relationships are a fundamental part of the structure of the data, not a bolt-on. In addition, each node and relation can have properties, which are effectively key value pairs that can store almost anything, of any size and format.
Neo Technology provides an informative description of graphs (recommended reading). The following diagram, represented as a graph, shows the components of a graph database, and their relations.