Back up an online database
Remember to plan your backup carefully and to back up each of your databases, including the |
Command
A Neo4j database can be backed up in online mode using the backup
command of neo4j-admin
.
The command must be invoked as the neo4j
user to ensure the appropriate file permissions.
Usage
The neo4j-admin backup
command can be used for performing both full and incremental backups of an online database.
The command can be run both locally and remotely.
By default, neo4j-admin backup
also checks the database consistency at the end of every backup operation.
However, it uses a significant amount of resources, such as memory and CPU.
Therefore, it is recommended to perform the backup on a separate dedicated machine.
The neo4j-admin backup
command also supports SSL/TLS.
For more information, see Online backup configurations.
|
Syntax
neo4j-admin backup --backup-dir=<path>
[--verbose]
[--expand-commands]
[--from=<host:port>]
[--database=<database>]
[--fallback-to-full=<true/false>]
[--pagecache=<size>]
[--check-consistency=<true/false>]
[--report-dir=<path>]
[--check-graph=<true/false>]
[--check-indexes=<true/false>]
[--check-index-structure=<true/false>]
[--check-label-scan-store=<true/false>]
[--check-property-owners=<true/false>]
[--additional-config=<path>]
[--include-metadata=<all/users/roles>]
[--prepare-restore=<true/false>]
[--parallel-recovery=<true/false>]
Please note that the following options have been deprecated:
Values for these settings will be ignored. |
Options
Option | Default | Description | ||
---|---|---|---|---|
|
Target directory. |
|||
|
Enable verbose output. |
|||
|
Allow command expansion in config value evaluation. |
|||
|
|
Host and port of Neo4j. |
||
|
|
Name of the remote database to back up. The value can contain
|
||
|
|
If an incremental backup fails, backup will move the old backup to |
||
|
|
The size of the page cache to use for the backup process. |
||
|
|
Run a consistency check against the database backup. |
||
|
|
Directory where consistency report will be written. |
||
|
|
Perform consistency checks between nodes, relationships, properties, types, and tokens. |
||
|
|
Perform consistency checks on indexes. |
||
|
|
Perform structure checks on indexes. |
||
|
|
This option is deprecated, and its value is ignored. |
||
|
|
This option is deprecated, and its value is ignored. |
||
|
Configuration file to provide additional or override the existing configuration settings in the neo4j.conf file. |
|||
|
|
all |
||
|
|
Perform the recovery of the backup store by applying the latest pulled transactions. If disabled, the backup will be faster, but a recovery of the backup store will be required at a later time before restoring the data. For more information on how to do that, see Prepare a database for restoring.
|
||
|
|
Allow multiple threads to apply transactions to a backup in parallel. For some databases and workloads, this may reduce execution times significantly.
|
Exit codes
Depending on whether the backup was successful or not, neo4j-admin backup
exits with different codes.
The error codes include details of what error was encountered.
Code | Description |
---|---|
|
Success. |
|
Backup failed. |
|
Backup succeeded but consistency check failed. |
|
Backup succeeded but consistency check found inconsistencies. |
Code | Description |
---|---|
|
All databases are backed up successfully. |
|
One or several backup failed. |
Online backup configurations
Server configuration
The table below lists the basic server parameters relevant to backups. Note that, by default, the backup service is enabled but only listens on localhost (127.0.0.1). This needs to be changed if backups are to be taken from another machine.
Make this change only if you need the remote backup. If your network is not adequately isolated, this change might expose your system to threats. |
Parameter name | Default value | Description |
---|---|---|
|
Enable support for running online backups. |
|
|
Listening server for online backups. |
It is not recommended to use an NFS mount for backup purposes as this is likely to corrupt and slow down the backup. |
Make sure to follow the Security Configurations in order to prevent unauthorized users from accessing the DBMS by having access to the backup server. |
Memory configuration
The following options are available for configuring the memory allocated to the backup client:
- Configure heap size for the backup
-
HEAP_SIZE
configures the maximum heap size allocated for the backup process. This is done by setting the environment variableHEAP_SIZE
before starting the operation. If not specified, the Java Virtual Machine chooses a value based on the server resources. - Configure page cache for the backup
-
The page cache size can be configured by using the
--pagecache
option of theneo4j-admin backup
command. If not explicitly defined, the page cache defaults to8MB
.You should give the Neo4J page cache as much memory as possible, as long as it satisfies the following constraint:
Neo4J page cache + OS page cache < available RAM, where 2 to 4GB should be dedicated to the operating system’s page cache.
For example, if your current database has a
Total mapped size
of128GB
as per the debug.log, and you have enough free space (meaning you have left aside 2 to 4 GB for the OS), then you can set--pagecache
to128GB
.
Computational resources configurations
- Consistency checking
-
Checking the consistency of the backup is a major operation which may consume significant computational resources, such as, memory, CPU, I/O. When backing up an online database, the consistency checker is invoked at the end of the process by default. Therefore, it is highly recommended to perform the backup and consistency check on a dedicated machine, which has sufficient free resources, to avoid adversely affecting the running server.
Alternatively, you can decouple the backup operation from the consistency check (using the
neo4j-admin backup
option--check-consistency=false
) and schedule that part of the workflow to happen at a later point in time, on a dedicated machine. Consistency checking a backup is vital for safeguarding and ensuring the quality of the data, and should not be underestimated. For more information, see Consistency checker.To avoid running out of resources on the running server, it is recommended to perform the backup on a separate dedicated machine.
- Transaction log files
-
The transaction log files, which keep track of recent changes, are rotated and pruned based on a provided configuration. For example, setting
dbms.tx_log.rotation.retention_policy=3
files keeps 3 transaction log files in the backup. Because recovered servers do not need all of the transaction log files that have already been applied, it is possible to further reduce storage size by reducing the size of the files to the bare minimum. This can be done by settingdbms.tx_log.rotation.size=1M
anddbms.tx_log.rotation.retention_policy=3
files. You can use the--additional-config
parameter to override the configurations in the neo4j.conf file.Removing transaction logs manually can result in a broken backup.
Security configurations
Securing your backup network communication with an SSL policy and a firewall protects your data from unwanted intrusion and leakage.
When using the neo4j-admin backup
command, you can configure the backup server to require SSL/TLS, and the backup client to use a compatible policy.
For more information on how to configure SSL in Neo4j, see SSL framework.
For a detailed list of recommendations regarding security in Neo4j, see Security checklist. |
The following table provides details on how the configured SSL policies map to the configured ports.
Topology | Backup target address on database server | SSL policy setting on database server | SSL policy setting on backup client | Default port |
---|---|---|---|---|
Standalone instance |
|
|
|
|
Causal cluster |
|
|
|
|
It is very important to ensure that there is no external access to the port specified by the setting |
Cluster configurations
In a cluster topology, it is possible to take a backup from any server, and each server has two configurable ports capable of serving a backup.
These ports are configured by dbms.backup.listen.address
and causal_clustering.transaction_listen_address
respectively.
Functionally, they are equivalent for backups, but separating them can allow some operational flexibility, while using just a single port can simplify the configuration.
It is generally recommended to select Read Replicas to act as backup servers, since they are more numerous than Core members in typical cluster deployments.
Furthermore, the possibility of performance issues on a Read Replica, caused by a large backup, will not affect the performance or redundancy of the Core members.
If a Read Replica is not available, then a Core can be selected based on factors, such as its physical proximity, bandwidth, performance, and liveness.
To avoid taking a backup from a cluster member that is lagging behind, you can look at the transaction IDs by exposing Neo4j metrics or via Neo4j Browser.
To view the latest processed transaction IDs (and other metrics) in Neo4j Browser, type |
Examples
The following are examples of how to back up a single database, e.g., the default database neo4j
, and multiple databases, using the neo4j-admin backup
command.
The target directory /mnt/backups/neo4j must exist before calling the command and the database(s) must be online.
neo4j-admin backup
to back up a single database.bin/neo4j-admin backup --backup-dir=/mnt/backups/neo4j --database=neo4j
To backup several databases that match database pattern you can use name globbing. For example, to backup all databases that start with n you should run:
neo4j-admin backup
to back up multiple databases.neo4j-admin backup --from=192.168.1.34 --backup-dir=/mnt/backups/neo4j --database=n* --pagecache=4G
For a detailed example on how to back up and restore a database in a Causal cluster, see Back up and restore a database in Causal Cluster. |