Not only are epidemics graphs: Graphs are an epidemic
Diana Kupfer talks about the growth and development of Neo Technology and also discuss Cypher and Neo4j 2.0
Neo4j is, without a doubt, a pioneering technology. “Graph database”, the term with which it has become synonymous, had not even been coined when this Java technology first emerged over a decade ago.
Since its launch, Neo4j has become an impressive success story – and the hype continues to grow. In recent years, Neo Technology, the company which has officially sponsored its development since 2007, has also organized conferences to celebrate the rise of Neo4j – and not only connected data, but also the growing community around it. The fifth, and the inaugural European, “Graph Connect” event was held on the 18th and 19th November in London’s Dexter House, and attracted both users and fans of Neo4j.
But let’s backtrack a bit. The Neo4j story actually began in 2000. Back then, it was already clear that the storage capacities of data with traditional database systems were rapidly being exhausted. Stored data links were too rich, JOIN operations were too painstaking, and query speeds were just too low.
Little surprise, then, that relational databases have subsequently faced competition from NoSQL alternatives. Even key-value stores, document and column-oriented databases, which Martin Fowler summed up with the term “Aggregate-oriented Databases“, failed to take the connections between the data sufficiently into account.
For this reason, the Neo4j founders envisioned a realistic information storage system in the form of networks and links. One that would store not only the data, but also their complex web of relationships – and make them totally visible. Back then, “Nobody was talking about graph databases,” recalled Emil Eifrem, CEO of Neo Technology, in his conference opening keynote. In those days, he and his team used terms such as “Network-oriented database” or the shortened version: “Netbase“.
Today, thirteen years later, graph technologies have seen a virtually meteoric rise, which Eifrem noted is demonstrated with a simple Google Trend search for the term “Neo4j”.
Between 30 and 40 major companies today utilise Neo4j, including Hewlett-Packard, Deutsche Telekom, Oracle, IBM, and Cisco. Neo4j is now even represented in the insurance sector – a market you wouldn’t necessarily think would have a need to integrate cutting-edge technology into its enterprise IT. At GraphConnect, Frederik Wilhelm and Dr. Andreas Transforms from Intelligence Solutions AG reported how they convinced the insurance group “The Bavarian” to adopt Neo4j, while the alternative option, ObjectDB, had to draw the short straw.
In 2012, big players such as Google and Facebook jumped onto the graph bandwagon with Knowledge Graph Search (“Neo4j Cypher for non-techies,” as Patrick Baumgartner jokes), consolidating Emil Eifrem, Peter Neubauer and their colleagues’ position as pioneers in the field.
In early 2013, an impressive blog post by developer Max De Marzi outlined how information from Facebook in can be transformed into Cypher statements – Cypher being Neo4j’s own query language.
The visualization of large amounts of data and information networks is a growing branch of science, underpinned not so much by an aesthetic playfulness, but by the concrete need to make the relationships among “Big Data” visible, and thus understandable.
What’s behind this hype? This is how Eifrem explains it: Up until 1999, Internet search engines such as AltaVista were keyword-based. Shortly before the turn of the millennium Google initiated a paradigm change with its search algorithm PageRank –moving away from discrete data towards connected data: “Not only did they store the documents, but also how they relate to each other,” said Eifrem. Thus, a shift took place, from keyword search to “social discovery” (Eifrem ) – and that’s where technologies play off their strengths.
Looking at the bigger picture, the realistic representation of data relationships is the foundation of the personalized technologies that Robert Scoble and Shel Israel put under the heading of “The Age of Context”: As exemplified by Google, the cross-linking of individual data gives rise to semantic knowledge, which the software can leverage to anticipate the user’s behavior in any given situation.
Making invisible connections visible
Despite all the buzz around Neo4j’s success, many GraphConnect speakers recommended examining possible graph database scenarios carefully. “Know your domain!”, as Tareq Abedrabbo from the London consulting firm Open Credo put it.
In his talk, “Neo4j in Theory and Practice”, Abedrabbo differentiated between between domain-centric and data-centric applications. A classic example of the type first mentioned is a recommendation engine: a well-defined data model with a “top-down” design, with flexible but predictable data structures that can be alternated by user input. Data-centric approaches, on the other hand, are marked by a complex set of data that represent networks of the real world.
In data-centric applications, different data sources are typically integrated with each other. The design follows the “bottom-up ” principle. A stereotypical example would be telecommunication networks. Although graph technologies are “naturally data-driven,” according to the speaker, the categories are anything but clear-cut. For more domain-centric applications, Abedrabbo recommends the use of a mapping framework such as Spring Data Neo4j.
Visualization strategies were presented by Joe Parry from the startup Cambridge Intelligence, which specializes in this area. “Data is invisible, but the user has to see it,” was one of his key messages. The essential task of visualizers is to make the graphical presentation of data semantically unambiguous and thus make the intangible tangible, he said.
For example, a thick connecting line (edge, in graph theory) between two objects (nodes) represents a particularly strong nexus – obvious, one might think. Nevertheless, the same mistakes are being made over and over. For Parry, 3D visualizations, poor color schemes, missing tooltips and lack of interaction are some of the most common mistakes.
In terms of making invisible connections visible, Glen Ford (Zeebox) arguably provided the most vivid example of the day: From the graph model in his talk “Graphing the Second Screen”, he clearly showed that former “Dr. Who” actor Tom Baker, not only made an appearance in “Blackadder“, but is also featured as the narrator in “Little Britain” – a connection which, despite the popularity of the series, even the most ardent fans in the audience were probably unaware of.
Read the Full Article Here.