How Facebook matured its data structure and stepped into the graph world
Facebook shifts to a TAO data store
Several years ago Facebook engineers realized they had a problem. The old way of storing and accessing users’ data was bogging down their infrastructure and their code. So in 2009, engineers at the social networking site started working on a database architecture that could perform better than its then-current relational database backed by Memcached for in-memory caching and MySQL for persistent storage.
They built a graph database. The fruit of their efforts, The Associations and Objects (TAO) distributed data store, is a system purpose-built for the storage, expansion and, most importantly, delivery of the complex web of relationships among people, places and things that Facebook represents. And on Tuesday Facebook published a blog post that details the physical infrastructure that underlies the TAO data store.
One reason for building TAO was the complexity of querying data from both MySQL and Memcached, according to the post, written by software engineer Mark Marchukov:
Product engineers had to work with two data stores and very different data models: a large cluster of MySQL servers for storing data persistently in relational tables, and an equally large collection of memcache servers for storing and serving flat keyvalue pairs derived (some indirectly) from the results of SQL queries. Even with most of the common chores encapsulated in a data access library, using the memcache-MySQL combination efficiently as a data store required quite a bit of knowledge of system internals on the part of product engineers.
The graph schema made possible by the TAO approach is similar in some ways to the way of organizing information inside Facebook’s Graph Search tool, which thinks of the world in terms of nodes (objects, or people, places and things) and edges (associations, or the connections among them). Maintaining that data in a relational schema becomes less and less advantageous as the data grows, and thus TAO and its corresponding API were born.
Marchukov thinks the implementation of the graph model makes TAO stand out. The data store “validates the graph data model as a way to access data on challenging read-dominated loads that social networks and sites with similar workloads face,” he told me in an interview. Indeed, graph databases such as Neo4j have been getting attention lately for their ability to efficiently show relationships. The graph model even appears to be useful for intelligence agencies trying to understand who is related to whom and in what ways.