Managing Neo4j Uniqueness with Py2neo
Nigel Small, an active blogger, speaker and Neo4j community member, discusses Py2neo and how you it works with Neo4j
Data duplication isn’t fun, and many of us have spent hours trying to track a duplication problem back to its source in an effort to plug the leaks in some legacy system. It’s obviously better to take some time to avoid the problem happening in the first place, and since almost every non-trivial piece of data-driven software will have some requirement to manage uniqueness, it’s worth knowing how to avoid it.
In the context of Neo4j, it may be necessary to ensure that individual nodes are unique within a particular context; or there may be more complex uniqueness requirements across paths or subgraphs. Read Using Neo4j from Python for more on Neo4j and the Py2neo library. Neo4j provides two basic ways to ensure uniqueness: unique index entries and the Cypher CREATE UNIQUE clause. The first way allows nodes (or less commonly, relationships) to be tagged by a key-value pair. If used correctly, the key-value pair will identify only a single entry. CREATE UNIQUE is used to ensure paths across two or more nodes are unique within the context of known reference points.
Unique Index Entries
It’s important to remember that as far as Neo4j itself is concerned, a standard index and a unique index are fundamentally the same thing. The difference exists only in the methods used to work with the index. Broken application code can still permit multiple entries and, for that reason, applications should gracefully handle the condition of multiple index entries when only one is expected.
Py2neo, a Python library that provides access to Neo4j via its RESTful web service interface, exposes unique index management through the Index class (unsurprisingly) as well as through the WriteBatch class and the high-level get_or_create_indexed_node function. This function is a convenient wrapper around functionality from the Index class, and it is often used to create fixed reference points within the graph.
The Index class provides three atomic methods for working with unique nodes (or relationships):
- get_or_create(self, key, value, abstract) – if a node exists under the given key-value, return it; otherwise, create and return a new node using the abstract provided.
- create_if_none(self, key, value, abstract) – operates identically to get_or_create, but will return None if no node previously existed, which is useful to identify when a node is newly created.
- add_if_none(self, key, value, entity) – similar to create_if_none, but adds an existing node to the index instead of creating a new one, and returns None if nothing was added.
The code below illustrates the utility of these methods. It shows several ways to approach the problem of fetching a node from an index that uniquely represents a person: