Keyhole Software rewrites their “MapQuest for Trains” application – originally built Java with Oracle 9i – with Neo4j, a Graph Database.

In 2007, a colleague and I used Java with Oracle 9i to implement Dijkstra’s Algorithm. Our “MapQuest for Trains” application would route a rail train over various right-of-ways while minimizing cost. The cost was a function of distance, fuel surcharge, and obstacles. The task to route a train from Los Angeles to Chicago had a grotesquely long response time. Nobody wanted their applications deployed on our nodes because we spiked the servers!

Where did we go wrong? The USA rail map is a living instance of graph theory. We modeled an extensive network of nodes (stations) and connecting links (tracks) as node tables joined through link tables. Navigation through this model involved n-level joins. An RDMS hits the wall as it proceeds through 4th and 5th level joins. See the Neo4j in Action book in the references for benchmarks of a graph DBMS Neo4j against RDBMS MySQL. We tried sidestepping this deep join problem by working with the network in memory for the life of the application instance. The North American rail system is quite large. The application thrashed.

Graph Data Store

Fast forward to summer, 2012. I stumbled onto Neo4j, a graph database. The literature described suitability of Neo4j for modeling social networks. It could rapidly answer queries such as, “How many of Judy’s friends, their friends, and their friends’ friends have read Little Women?” I could use Neo4j to create my own Facebook or Twitter. Think of a social network as an instance of graph theory. I wasn’t interested, but how about a do-over at routing trains through the graph that comprises the USA rail network?

Raw Data

I wanted to publish my work with no strings attached. I found an open 1996 National Transportation Atlas Database ZIP file of rail data. I found a flat node file and a link file within. Each record of the node file had an integer ID plus a pair of fields for latitude and longitude. Each record of the link file had a distance field along with the node IDs for the node pair it connected. Perfect!

Bulk Load

This data could directly map to a Neo4j database. See Figure 1. I conceived Bizarro World, where railroad stations would have graph nodes with integer “names.” Hey, it’s only a demo! Each station node would have a latitude property and a longitude property. A station-to-station connecting track becomes a graph link having a single node-to-node distance property. Any number of links could fan out from a given station node.

I wrote a command-line Eclipse Maven project to bulk-insert a file of nodes, index them, and insert the corresponding file of links – all into an embedded Neo4j database. Run time was 20 minutes on my MacBook Air! I discovered it has a fan! Neo4j is ACID transactional. I discovered a non-transactional bulk insert API that speeded the insertion by an order-of-magnitude.

Read the full article.