The Graph Database: One Option for Exploring Big Data
It’s not easy making sense out of a lot of “noise” in data, which is why I think you see conflicting answers about how to “best” approach large datasets. It’s like two different approaches to fishing.
“The majority of companies are on the sidelines because they think they can’t readily access the data they have, they don’t have in house tools or talent to analyze it and don’t have the ability to put the data to use anyway,” writes Matthew Crowl in a recent CAN blog post.
One answer may be the graph database, which uses nodes, properties and edges rather than traditional indexing to store data. In other words, it allows you to create a graph of connections between people, objects and data.
“Relational databases are not very good at relationships … They’re great at the things they do, but when you get a lot of relationships in the data, it all becomes very clumsy and the queries become more complex because everything runs more slowly,” explained Leon Guzenda, the CTO and one of Objectivity’s founders. “A graph database is all about the real connections. Imagine a map. So we have cities. So the nodes in the graph are the cities and then the connection between the cities are technically called edges, so each road, each waterway is represented by an edge.”
One way you could use the graph database is if you wanted to map a route from San Francisco to New York based on shipping preferences — weight, length, cargo content — a graph database could tell you what’s the best, cheapest route, while eliminating any options that wouldn’t be able to handle the cargo.
While it allows you to ask specific questions, it also is very open-ended in terms of presenting further relationships. For instance, the demo traced the connections between the two phone numbers — and all the calls in between — but it could also allow you to explore degrees from that relationship — for instance, what if you want to look at the circle of calls with 5 degrees of separation versus 7?
Patterns quickly become apparent, Guzenda explained. A circle of connections that only call each other is likely to indicate a terrorist cell, for instance, he said.
“They don’t call anyone else in the world — that’s it, a very closed ring, you’re probably looking at a terrorist cell,” Guzenda said. “If there’s 30 or 40 people involved and some of them call others and a few people in this group call a lot of people in this group, you’re probably looking at a drug ring and dealers and so on.”
The original system took anywhere from four to 20 or more hours to run a query. Guzenda managed to show it within mere minutes.
“I pushed out seven degrees here. That would take Oracle about 10 hours,” Guzenda said. “That meant after three days, they’d have to release the suspect.”
In the marketing world, relationships can be used to expand something like MCI’s friends and family calling plan.
Most organizations have graph data that could be used, whether it’s logistics, operational, financial transactions or other hierarchical data, he explained.
“The data is in there, it’s just a matter of having a way to represent it and look at it properly,” he said.