Graphs make headway, but simplicity is still a few hops away
Graph databases are becoming popular for making recommendation systems and other applications
Webscale companies such as Facebook, Google, and Netflix have come clean about how they use graph processing to quickly reveal the seemingly disparate connections among people, places and things. And more use cases for graph databases emerged Monday at the 2013 GraphLab workshop in San Francisco.
But even though it became clearer what’s possible when data is organized in graphs — better e-commerce and Twitter follower recommendations and lighter infrastructure usage, for example — some speakers pointed to the need for graphs and machine learning to become easier to implement.
Graphs at scale at Twitter and Walmart
Twitter’s Who to Follow tool is a fine example of a product benefiting from a graph model for data. Who to Follow depends on the FlockDB graph database and the Cassovary in-memory graph-processing engine Twitter constructed in-house and then released to everyone under an Apache License. The product mines existing connections among users, shared interests and other data in order to makes its recommendations with data in a graph that can run inside the memory of a single server.
Take it as proof that the graph model can provide advantages over a more traditional relational model for certain kinds of applications. The system’s success over the past three years demonstrates that it’s not only possible but preferable for a graph to run in a single instance of memory, said Pankaj Gupta, head of the personalization and recommender systems group at Twitter.
Lei Tang, a data scientist at @WalmartLabs, talked about how he’s been working on drawing on lots of data sources to recommend products to website users that they might actually want to buy.
A smart recommendation system ought to shift in response to incoming data on, say, a user’s page views and purchases, he said. This is where clustering of products can be wise. So while a user might view a bunch of televisions before ultimately buying one, the cluster of television products within the larger set of products the system can recommend should be set aside as soon as the purchase happens. Recommend a television with big discounts after the purchase, Tang said, and “users are really pissed off.”
Also, in the domain of e-commerce it’s important to add nuance into recommendations. For example, a good recommendation system would suggest to users a primary product such as an iPhone before showing accessories such as a case or earphones. So companies should make those page views and other data count and focus on granular product categories in order to maximize purchases through recommendations.
And these sorts of fine-grained tweaks need to be made quickly for millions of users, so the system can’t be too computationally intensive. Tang and his colleagues appear to have come up with a scalable system that meets these requirements, although he said there’s still room for improvement.