When to move beyond relational database
Neil A. Chaudhuri, founder and president of Vidya, reviews the database design with options for NoSQL databases
One of my current projects is the review of an application built by a contractor for a major federal agency. The code relies heavily on queries and stored procedures against a relational database management system (RDBMS). These are easily the most complex I have ever seen. Normally when technical debt accrues in a database, IT managers refactor the database design immediately and utilize views, indexes, precomputation and all the other goodies a RDBMS offers.
However, it may very well be that the relational paradigm, as venerable and successful as it is, is simply not the best choice for an application. There are alternative paradigms, notably document and graph databases – colloquially known as NoSQL databases – that have their own advantages and disadvantages. But let’s first remind ourselves why RDBMSs have dominated for so long before exploring how NoSQL databases compare.
Anyone who watched The Wire might remember the bulletin board used by the Major Crimes Unit to display an evolving org chart for crime syndicates in Baltimore. The hierarchy was determined through analysis of communications and other data. That was the first graph database I ever saw.
With a graph database, data is modeled as a collection of nodes connected by edges – both endowed with attributes. As always, the data model must be optimized for the anticipated queries – for example, when deciding whether certain data belongs in a node or edge.
This is a fundamental shift from RDBMSs. When working with network data (such as SIGINT, financial transactions, or migration patterns) modeling in tables and relationships can be awkward. Much worse, RDBMSs can be quite slow for the kinds of queries that matter on graphs like shortest paths, community detection and centrality.
Also built upon a mathematical foundation, graph databases like open-source Neo4J are ideal for storing and querying network data. Though other approaches are available, Cypher, a Neo4J-specific language, is a good choice for querying a graph. It has a steep learning curve, but, in experienced hands, Cypher is a powerful and performant query language.
Like RDBMSs, Neo4J supports ACID transactions and indexing. Commercial support and drivers are available in all major programming languages.
Data is the lifeblood of applications. While RDBMSs will always be robust and powerful and perhaps most familiar, follow the advice of lean software development experts Mary and Tom Poppendieck to consider all options to make applications easier to develop and faster to run.