Neo4j – Beyond Hadoop: Graph Databases for Big Data
Tom’s IT Pro in looking at ‘Small to Big Data Solutions’ notes that graph databases such as Neo4j would be better than Hadoop for some big data problems, particularly those described with networks and its relationships between objects.
Not all big data problems fit the Hadoop model; for network and graph problems Neo4j may be a better alternative. If you are working with large volumes of data that is best modeled as a network, think social networking, transportation patterns, epidemics and a host of other modeling problems, look into graph databases.
Graph databases, for example, are well suited for problems that can be described in networks, such as social networks, workflows, transportation networks, and communication patterns. Certainly, you can solve many network problems using MapReduce (think of Google’s Page Rank algorithm) but it is often easier to solve problems when the data structure directly represents key aspects of the problem description. Let’s first take a look at graph databases in general before turning our attention to specific business applications.
Databases are typically designed around a fundamental data structure. Relational databases, like MySQL, Oracle and SQL Server, are based on tables, which in turn are collections of rows of attributes. Online analytic processing (OLAP) databases are built on multi-dimensional cubes. CouchDB and MongoDB are document-oriented databases that use JSON-like data structures. Graph databases, like Neo4j, have a simple underlying data structure that consists of two types of objects: nodes and vertices. This simplicity lends itself to modeling a wide range of problems.
A node is typically used to represent an entity, such as a person in a social network, a location in a transportation network, or Web page on the Internet. The relationships between nodes are represented in vertices which can be thought of as links between nodes. For example, to model the fact that Alice and Bob work together we could create two nodes representing the employees and a vertex between them indicating their “work together” relationship.
In the case of a transportation model, we could model two locations using two nodes and represent the connection between the two locations using a vertex. Nodes and vertices are quite useful by themselves but the graph model becomes even more useful with the addition of properties on nodes and vertices.
Node properties can describe an entity just as attributes in a relational database table describe an object. The employee nodes described above could have attributes indicating the employee type, e.g. manager, analyst, or systems administrator. The vertices similarly use properties to describe links. For example, the link between two locations in a transportation network can indicate the distance between the two locations.