Clustering glossary

Term Description

Asynchronous replication

Enables efficient scale-out of secondary database copies but offers no guarantees under fault conditions. The data present in the secondary copy is not guaranteed to be up-to-date with a majority of the database’s primary copies.

Availability

The ability to access data in a database. A database can be available for read-write, read-only, or altogether unavailable. A clustered database is fault-tolerant, i.e. it can maintain both read and write availability if some primaries fail (see Fault tolerance for more information). If the number of failed primaries exceeds the fault tolerance limit, the database becomes read-only. Should all copies fail, the database becomes unavailable.

Bookmark

A marker the client can request from the cluster to ensure that it is able to read its own writes so that the application’s state is consistent and only databases that have a copy of the bookmark are permitted to respond.

Causal consistency

When a client (driver) creates a session and executes a query, the responding server issues the client a bookmark. This reflects the state of the database copy on that server at the time the query was executed. The bookmark is passed along and updated by all subsequent queries in the session, regardless of which server executes what query. A bookmark can only be updated monotonically increasing. If a server is behind the state in the bookmark, it waits until it has caught up, or time out the query. Thus, clients executing queries within a session are guaranteed to read their own writes, and only see successively later states of the database. This is sometimes also referred to as session consistency.

Causal cluster

A collection of servers running Neo4j that are configured to communicate with each other. The servers can be either Primary or Secondary, where the Primary servers allow read and write operations and the Secondary servers allow only read operations. As long as it uses bookmarks, the cluster guarantees that a client application can read at least its own writes. See also Primary server, Secondary server, and Causal consistency.

Database

The data store for the nodes, relationships, and properties that make up the graph. Multiple databases can be hosted on a Database Management Server (DBMS).

Database Management System (DBMS)

The Neo4j services and system database running on an instance of a single server or cluster to provide one or more databases.

Disaster recovery

A manual intervention to restore availability of a cluster, or databases within a cluster.

Election

In the event that a leader becomes unresponsive, followers automatically trigger an election and vote for a new leader. A majority is required for the vote to be successful.

Fault tolerance

A guarantee that a database can maintain persistence and availability in the event of one or more failures. The number of failures f that can be tolerated is dependent on the number of primaries n for the database and follows the formula f = (n-1)/2. In the event that more than f primaries fail, the database can no longer process write transactions and becomes read-only.

Follower

A primary copy of a database acting as a follower, receives and acknowledges synchronous writes from the leader.

Leader

Each database has a designated leader within the cluster and it can only be located on a primary server. The leader receives all write transactions from clients and replicates writes synchronously to followers and asynchronously to secondary copies of the database. Each database can have a different leader within the cluster.

Primary server

A primary server can be either a single instance or a core instance, both of which allow read and write operations. A single instance as primary is beneficial for read scalability but is not fault tolerant. A Core instance safeguards data and a cluster of at least three Core instances is fault tolerant. It participates in fault tolerant writes as it is part of the majority required to acknowledge and commit write transactions.

Read scaling

Adding Secondary servers to the cluster can offload read queries from the Primary servers and thus reduce the load and aid write performance of the cluster.

Secondary server

An asynchronously replicated instance that provides read scaling within the cluster. Secondary servers are made up of Read Replica instances.

Seed

A file used to create a database on a single instance or on a member of a cluster. This can be a database dump or a database backup. Seed can also be used as a verb to describe the act seeding a cluster from a backup.

Server

A physical machine, a virtual machine, or a container running Neo4j DBMS. The server can be standalone or part of a cluster.

Session consistency

An alternative name for Neo4j’s causal consistency.

Standalone server

A single server, or container, running Neo4j DBMS and not part of a cluster.

Synchronous replication

When attempting to commit a transaction, the leader primary replicates the transaction and block, requiring the follower primaries to acknowledge the replication before allowing the commit to proceed. This blocking replication is known as synchronous, and ensures data durability and consistency within the cluster. See also asynchronous replication.