apoc.meta.subGraph

Procedure

apoc.meta.subGraph(config MAP<STRING, ANY>) - examines the given sub-graph and returns a meta-graph.

This procedure returns virtual nodes and relationships that can only be accessed by other APOC procedures. For more information, see Virtual Nodes & Relationships (Graph Projections).

This procedure is not considered safe to run from multiple threads. It is therefore not supported by the parallel runtime (introduced in Neo4j 5.13). For more information, see the Cypher Manual → Parallel runtime.

Signature

apoc.meta.subGraph(config :: MAP) :: (nodes :: LIST<NODE>, relationships :: LIST<RELATIONSHIP>)

Input parameters

Name Type Default

config

MAP

null

Config parameters

The procedure support the following config parameters:
Table 1. Config parameters
name type default description

includeLabels

LIST<STRING>

[]

labels to include. Default is to include all labels.

includeRels

LIST<STRING>

[]

relationship types to include. Default is to include all relationship types.

excludeLabels

LIST<STRING>

[]

labels to exclude. Default is to not exclude any label.

sample

INTEGER

1000

number of nodes to sample per label. See "Sampling" section below.

maxRels

INTEGER

100

number of relationships to be analyzed, by type of relationship and start and end label, in order to remove/add relationships incorrectly inserted / not inserted by the sample result.

Sampling

Because the count stores return an incomplete picture of the data, it is necessary to cross-check the results with the actual data to filter out false positives.

Specify the sample parameter (1000 by default) to analyze a subset of the data.

Using this parameter, the data is split for each node-label into batches of (total / sample) ± rand, where total is the total number of nodes with the given label and rand is a number between 0 and total / sample / 10.

Select a percentage of nodes with the given label of roughly sample / total * 100% to check against. Then select the first node in each batch to analyze the properties and the relationships.

Table 2. Deprecated parameters
name type default description

labels

LIST<STRING>

[]

deprecated, use includeLabels

rels

LIST<STRING>

[]

deprecated, use includeRels

excludes

LIST<STRING>

[]

deprecated, use excludeLabels

Output parameters

Name Type

nodes

LIST<NODE>

relationships

LIST<RELATIONSHIP>

Sampling

This procedure works by using the database statistics. A new node is returned for each label, and its connecting relationships are calculated based on the pairing combinations of [:R]→(:N) and (:M)→[:R]. For example, for the graph (:A)-[:R]→(:B)-[:R]→(:C), the path (:B)-[:R]→(:B) will be calculated from the combination of [:R]→(:B) and (:B)-[:R]. This procedure will post-process the data by default, removing all non-existing relationships. This is done by scanning the nodes and their relationships. If the relationship is not found, it is removed from the final result. This slows down the procedure, but will produce an accurate schema.

See apoc.meta.graphSample to avoid performing any post-processing.

It is also possible to specify how many nodes and relationships to scan. The config parameter sample gives the skip count, and the maxRels parameter gives the max number of relationships that will be checked per node. If sample is set to 100, this means that every 100th node will be checked per label, and a value of 100 for maxRels means that for each node read, only the first 100 relationships will be read. Note that if these values are set, and the relationship is not found within those constraints, it is assumed that the relationship does not exist, and this may result in false negatives.

A sample value higher than the number of nodes for that label will result in one node being checked.

Usage Examples

The examples in this section are based on the following sample graph:

CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (TomH:Person {name:'Tom Hanks', born:1956})
CREATE (LillyW:Person {name:'Lilly Wachowski', born:1967})

CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
CREATE (TheMatrixReloaded:Movie {title:'The Matrix Reloaded', released:2003, tagline:'Free your mind'})
CREATE (TheMatrixRevolutions:Movie {title:'The Matrix Revolutions', released:2003, tagline:'Everything that has a beginning has an end'})
CREATE (SomethingsGottaGive:Movie {title:"Something's Gotta Give", released:2003})
CREATE (TheDevilsAdvocate:Movie {title:"The Devil's Advocate", released:1997, tagline:'Evil has its winning ways'})

CREATE (YouveGotMail:Movie {title:"You've Got Mail", released:1998, tagline:'At odds in life... in love on-line.'})
CREATE (SleeplessInSeattle:Movie {title:'Sleepless in Seattle', released:1993, tagline:'What if someone you never met, someone you never saw, someone you never knew was the only someone for you?'})
CREATE (ThatThingYouDo:Movie {title:'That Thing You Do', released:1996, tagline:'In every life there comes a time when that thing you dream becomes that thing you do'})
CREATE (CloudAtlas:Movie {title:'Cloud Atlas', released:2012, tagline:'Everything is connected'})

CREATE (Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrix)
CREATE (Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrixReloaded)
CREATE (Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrixRevolutions)
CREATE (Keanu)-[:ACTED_IN {roles:['Julian Mercer']}]->(SomethingsGottaGive)
CREATE (Keanu)-[:ACTED_IN {roles:['Kevin Lomax']}]->(TheDevilsAdvocate)

CREATE (TomH)-[:ACTED_IN {roles:['Joe Fox']}]->(YouveGotMail)
CREATE (TomH)-[:ACTED_IN {roles:['Sam Baldwin']}]->(SleeplessInSeattle)
CREATE (TomH)-[:ACTED_IN {roles:['Mr. White']}]->(ThatThingYouDo)
CREATE (TomH)-[:ACTED_IN {roles:['Zachry', 'Dr. Henry Goose', 'Isaac Sachs', 'Dermot Hoggins']}]->(CloudAtlas)

CREATE (LillyW)-[:DIRECTED]->(TheMatrix);
CALL apoc.meta.subGraph({
  includeLabels: ["Person", "Movie"],
  includeRels: ["DIRECTED"]
});
apoc.meta.subGraph filter
Figure 1. Meta Sub Graph