Here Comes the Graph DBMS: Way Fast, No Schemas
2013 appears to be shaping up as the year of the graph database.
And no, this is not simply because Facebook has so loudly trumpeted a public service based on its own social graph. No, the main reason for this declaration is that the software companies actually providing graph DBMS packages are showing some significant customer traction. Evidently, this once-exotic technology truly is catching on in Corporate America and beyond.
Case in point: Neo Technology, which is commercializing the open-source Neo4j graph DBMS, has more than 50 paying customers, including names such as job sites CareerBuilder, Resumate, and Glassdoor; Cisco Systems and T-Mobile; Pitney-Bowes; and what company officials tell me is one of the world’s largest parcel delivery services. (My guess: DHL.) The latter’s setup has Neo4j doing some 2,000 lookups per second as parcels move through a large sorting facility.
Neo is hardly alone in nurturing the nascent graph DBMS market. Among the other players are Objectivity, selling a package called Infinite graph; Sparsity Technologies, in Barcelona, with DEX (and software giant CA as a customer); Netmesh, with InfoGrid; Kobrix, with HyperGraph; Microsoft, selling Trinity; and Franz, a longtime provider of Lisp tools, selling AllegroGraph. Even supercomputer maker Cray is in on the technology, selling a specialized machine for high-speed graph work.
Neo CEO Emil Elfrem says the Neo4j package is helping out in such diverse fields as bio-informatics, insurance risk analysis, managing assets such as networking infrastructure, portfolio analysis, and organizing information about customers for the sake of salespeople.
Graph DBMSs operate differently than traditional relational packages, of course, which enables them to excel in certain types of applications. Generally speaking, they work well where the relationships, or connections between myriad entities must be analyzed, and where those relationships may, for any number of reasons, change frequently.
A good example is the product catalog of a large online retailer, listing many thousands of products from thousands of brands in thousands of categories, with new items, brands, and categories getting added and dropped all the time. In addition, the links showing what items shoppers have bought together are in constant flux, as may be any number of other properties and relationships that the retailer wishes to track and make available to shoppers.
While all these pieces of information and their linkages could be captured in a relational DBMS, that would require setting up the proper schema and then, changing that schema all the time as the catalog and its contents changed — a real chore that the highly flexible graph approach pretty much obviates. What’s more, today’s graph databases can handle on the order of 100 million items (or nodes) and potentially billions of connections (or edges, in graph-talk) between all those items.
Clearly, it takes some clever strategies to query this kind of data, to avoid having to traverse the entire logical graph it describes and thus answer queries in a reasonable amount of time. Evidently, though, these strategies have been worked out.
Neo’s Philip Rathle, senior director of products, describes a simple benchmark involving a social graph made up of 1,000 persons each with 50 friends. What are the connections between any two randomly-chosen persons down to a depth of four intervening nodes? Using MySQL, such a query takes two seconds to answer; in Neo4J, 2ms. And if the graph is expanded to include 1 million persons, the same query still takes only 2ms.
It’s that kind of query performance, along with no need to deal in intricate schemas, that’s making graph DBMS technology so attractive in certain situations. And just as with other types of DBMS, the growing size of main memories on servers and the falling cost of solid-state drives (SSD) is making the technology even more capable. Clustering servers is turning out to be a big help, too.