Seed a cluster

Introduction

Regardless of whether you are just playing around with Neo4j or setting up a production environment, you likely have some existing data that you want to transfer into your newly created Causal cluster. Neo4j supports seeding a cluster from a database dump, a database backup, or from another data source (with the Import tool). For more information about the different backup options and how to use the Neo4j Import tool, see Backup and restore options and Neo4j Admin tool.

It is possible to seed a cluster with a single database or multiple, including a full DBMS. Any seeding that includes restoring the system database needs to be done offline, but any other databases can be seeded online.

The databases that you want to seed and the Neo4j cluster must be of the same version.

The process for seeding a cluster is essentially the same for clusters with Single and Read Replica instances as for clusters with Core (and optional Read Replica) instances. However, using a designated seeder is only applicable to clusters with Core instances. The seeding is usually performed on primary instances only but it is possible to seed a Read Replica instance, yet it is not necessary unless for performance reasons.

Seed a cluster from a database dump (offline)

This could be an offline backup (i.e. a dump) from a standalone Neo4j instance or a cluster member (e.g., an existing Read Replica instance). The following example seeds a newly created cluster with an example DBMS consisting of the system database and the default database, neo4j from a dump. If you want to seed a single user database, follow the steps in Seed a cluster from a database backup (online) further on.

This scenario is useful in disaster recovery where some servers have retained their data during a catastrophic event.

Moving files and directories manually in or out of a Neo4j installation is not recommended and considered unsupported.

  1. Create a new Neo4j Core-only cluster following the instructions in Configure a cluster with Core instances but do not start any of the members. (If you have started any of the cluster members, stop and unbind each started member.)

  2. Use neo4j-admin load to seed each of the Core members in the cluster.

    The examples assume that you are restoring one user database with the default name of neo4j and the system database, containing the replicated configuration state. Modify the command line arguments to match your exact setup.

    neo4j-01$ ./bin/neo4j-admin load --from=/path/to/system.dump --database=system
    neo4j-01$ ./bin/neo4j-admin load --from=/path/to/neo4j.dump --database=neo4j
    neo4j-02$ ./bin/neo4j-admin load --from=/path/to/system.dump --database=system
    neo4j-02$ ./bin/neo4j-admin load --from=/path/to/neo4j.dump --database=neo4j
    neo4j-03$ ./bin/neo4j-admin load --from=/path/to/system.dump --database=system
    neo4j-03$ ./bin/neo4j-admin load --from=/path/to/neo4j.dump --database=neo4j
  3. Start each cluster member.

    neo4j-01$ ./bin/neo4j start
    neo4j-02$ ./bin/neo4j start
    neo4j-03$ ./bin/neo4j start

    The cluster forms and the replicated Neo4j DBMS deployment comes online.

The system database contains information about the databases that should exist in your Neo4j DBMS. If a database does not exist in your system database (because it has not been created previously), it must be created with CREATE DATABASE <database-name> even if it has been restored/loaded/imported.

Seed a cluster from a database backup (online)

These scenarios are useful when you want to restore a database in a running cluster.

If you have a running Neo4j database that you want to seed in a running cluster, use neo4j-admin backup to create a database backup. This could be a backup from a standalone Neo4j instance or another cluster member (e.g., an existing Read Replica).

Neo4j supports two types of seeding in a running cluster. You can either transfer the database backup to each Core instance or transfer it only to one Core instance and then use the CREATE DATABASE Cypher command to seed the cluster. For more information on the CREATE DATABASE syntax and options, see Cypher Manual → Creating databases.

Moving files and directories manually in or out of a Neo4j installation is not recommended and considered unsupported.

Restore a database on each Core instance

Transfer the database backup to each Core instance in the cluster using the neo4j-admin restore command and then use CREATE DATABASE to restore it. This example uses a user database called movies1.

  1. To ensure that the movies1 database does not exist in the cluster, on one of the Core members, use Cypher Shell and run DROP DATABASE movies1. Use the system database to connect. The command is automatically routed to the appropriate Core instance and from there to the other cluster members.

    DROP DATABASE movies1;

    Dropping a database also deletes the users and roles associated with it.

    If you cannot drop the database because your seeds include the system database (which cannot be dropped), you must run neo4j-admin unbind . However, this removes the cluster state of the Core instance and in turn the instance needs to be restarted in order to join the cluster. Thus, you are no longer restoring a database in a running cluster. See Seed a cluster from a database dump (offline) instead for instructions on how to seed an offline cluster.

  2. Restore the database on each Core member in the cluster.

    neo4j@core1$ ./bin/neo4j-admin restore --from=/path/to/movies1-backup-dir --database=movies1
    neo4j@core2$ ./bin/neo4j-admin restore --from=/path/to/movies1-backup-dir --database=movies1
    neo4j@core3$ ./bin/neo4j-admin restore --from=/path/to/movies1-backup-dir --database=movies1

    However, restoring a database does not automatically create it.

  3. On one of the Core instances, run CREATE DATABASE movies1 against the system database to create the movies1 database. The command is automatically routed to the appropriate Core instance and from there to the other cluster members.

    CREATE DATABASE movies1;
    0 rows
    ready to start consuming query after 701 ms, results consumed after another 0 ms
  4. Verify that the movies1 database is online on all members.

    SHOW DATABASES;
    +---------------------------------------------------------------------------------------------------------------------------+
    | name     | aliases | access       | address      | role       | requestedStatus | currentStatus | error | default | home  |
    +---------------------------------------------------------------------------------------------------------------------------+
    | "neo4j"  | []      | "read-write" | "core1:7687" | "leader"   | "online"        | "online"      | ""    | TRUE    | TRUE  |
    | "neo4j"  | []      | "read-write" | "core3:7687" | "follower" | "online"        | "online"      | ""    | TRUE    | TRUE  |
    | "neo4j"  | []      | "read-write" | "core2:7687" | "follower" | "online"        | "online"      | ""    | TRUE    | TRUE  |
    | "movies1"| []      | "read-write" | "core1:7687" | "leader"   | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "movies1"| []      | "read-write" | "core3:7687" | "follower" | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "movies1"| []      | "read-write" | "core2:7687" | "follower" | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "system" | []      | "read-write" | "core1:7687" | "follower" | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "system" | []      | "read-write" | "core3:7687" | "follower" | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "system" | []      | "read-write" | "core2:7687" | "leader"   | "online"        | "online"      | ""    | FALSE   | FALSE |
    +---------------------------------------------------------------------------------------------------------------------------+
    
    9 rows available after 3 ms, consumed after another 1 ms

Restore a database using a designated seeder

With a seeder, you transfer the database backup to one Core instance in the cluster using the neo4j-admin restore command. Then you use that member as a designated seeder to create the backed-up database on the other cluster members.

This example uses a user database called movies1 and a cluster that consists of three Core instances. The movies1 database does not exist on any of the cluster members.

If a database with the same name as your backup already exists in your cluster, see step 1 in Restore a database on each Core instance for details on how to drop it.

  1. Restore the movies1 database on one of the Core instances. In this example, the core1 member is used.

    neo4j@core1$ ./bin/neo4j-admin restore --from=/path/to/movies1-backup-dir --database=movies1
  2. Find the server ID of core1 by logging in to Cypher Shell and running dbms.cluster.overview(). Use any database to connect.

    CALL dbms.cluster.overview();
    +----------------------------------------------------------------------------------------------------------------------------------------+
    | id                                     | addresses                                  | databases                               | groups |
    +----------------------------------------------------------------------------------------------------------------------------------------+
    | "8e07406b-90b3-4311-a63f-85c45af63583" | ["bolt://core1:7687", "http://core1:7474"] | {neo4j: "LEADER", system: "FOLLOWER"}   | []     |
    | "aeb6debe-d3ea-4644-bd68-304236f3813b" | ["bolt://core3:7687", "http://core3:7474"] | {neo4j: "FOLLOWER", system: "FOLLOWER"} | []     |
    | "b99ff25e-dc64-4c9c-8a50-ebc1aa0053cf" | ["bolt://core2:7687", "http://core2:7474"] | {neo4j: "FOLLOWER", system: "LEADER"}   | []     |
    +----------------------------------------------------------------------------------------------------------------------------------------+
  3. On one of the Core instances, use the system database and create the database movies1 using the server ID of core1. The command is automatically routed to the appropriate Core instance and from there to the other cluster members. If the movies1 database is of considerable size, the execution of the command can take some time.

    CREATE DATABASE movies1 OPTIONS {existingData: 'use', existingDataSeedInstance: '8e07406b-90b3-4311-a63f-85c45af63583'};
    0 rows
    ready to start consuming query after 701 ms, results consumed after another 0 ms
  4. Verify that the movies1 database is online on all cluster members.

    SHOW DATABASES;
    +---------------------------------------------------------------------------------------------------------------------------+
    | name     | aliases | access       | address      | role       | requestedStatus | currentStatus | error | default | home  |
    +---------------------------------------------------------------------------------------------------------------------------+
    | "neo4j"  | []      | "read-write" | "core1:7687" | "leader"   | "online"        | "online"      | ""    | TRUE    | TRUE  |
    | "neo4j"  | []      | "read-write" | "core3:7687" | "follower" | "online"        | "online"      | ""    | TRUE    | TRUE  |
    | "neo4j"  | []      | "read-write" | "core2:7687" | "follower" | "online"        | "online"      | ""    | TRUE    | TRUE  |
    | "movies1"| []      | "read-write" | "core1:7687" | "leader"   | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "movies1"| []      | "read-write" | "core3:7687" | "follower" | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "movies1"| []      | "read-write" | "core2:7687" | "follower" | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "system" | []      | "read-write" | "core1:7687" | "follower" | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "system" | []      | "read-write" | "core3:7687" | "follower" | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "system" | []      | "read-write" | "core2:7687" | "leader"   | "online"        | "online"      | ""    | FALSE   | FALSE |
    +---------------------------------------------------------------------------------------------------------------------------+
    
    9 rows available after 3 ms, consumed after another 1 ms

Seed a cluster using the import tool

To create a cluster based on imported data, it is recommended to first import the data into a standalone Neo4j DBMS and then use an offline backup to seed the cluster.

  1. Import the data.

    1. Deploy a standalone Neo4j DBMS.

    2. Import the data using the import tool.

  2. Use neo4j-admin dump to create an offline backup of the neo4j database.

  3. Seed a new cluster using the instructions in Seed a cluster from a database dump (offline).

    Skip the system database in this scenario since it is not needed.