Cypher and Graph Queries
Learn Cypher, the intuitive query language for graph databases. Discover how its arrow notation makes graph traversal natural, and see why it outshines SQL for relationship-heavy queries.
A Language Built for Graphs
Imagine trying to describe directions to a friend. You wouldn't say "Take the road with ID 47, then the road with ID 103, then..." You'd say "Go down Main Street, turn right at the coffee shop, keep going until you hit the park." Natural, intuitive, visual.
Cypher is a query language that lets you talk to graph databases the same way. Instead of thinking in tables and joins, you draw patterns using arrows and nodes. It was created for Neo4j, the most popular graph database, and its elegance has influenced graph query languages across the industry. (Fun fact: it's named after a character from The Matrix, not cryptographic ciphers.)
The Arrow Notation: Drawing Your Data
The core idea of Cypher is beautifully simple: you describe the pattern you're looking for using ASCII art. Nodes are wrapped in parentheses, and relationships are arrows connecting them.
(person)-[:KNOWS]->(friend)
This reads exactly like it looks: a person who KNOWS a friend. The arrow shows direction, person knows friend, not necessarily the other way around. The square brackets contain the relationship type.
Let's build the example from our previous discussion: Lucy was born in Idaho, which is within the United States, which is within North America.
CREATE
(NAmerica:Location {name:'North America', type:'continent'}),
(USA:Location {name:'United States', type:'country'}),
(Idaho:Location {name:'Idaho', type:'state'}),
(Lucy:Person {name:'Lucy'}),
(Idaho) -[:WITHIN]-> (USA) -[:WITHIN]-> (NAmerica),
(Lucy) -[:BORN_IN]-> (Idaho)Look how naturally this reads. Each line either creates a node with properties, or draws a relationship between nodes. The chain (Idaho) -[:WITHIN]-> (USA) -[:WITHIN]-> (NAmerica) creates two edges in one flowing expression.
Asking Questions with MATCH
Creating data is nice, but the real power of Cypher is in querying. The MATCH clause uses the same arrow notation to find patterns in your graph.
Let's start simple. Find all people and where they were born:
MATCH (person:Person)-[:BORN_IN]->(place)
RETURN person.name, place.nameThis says: find any node labeled Person that has a BORN_IN relationship pointing to another node. Return both names.
The pattern acts like a template. Cypher finds all subgraphs that match the shape you've drawn, binds the matching nodes to your variable names, and returns what you ask for.
The Emigration Query: Following Chains
Now let's tackle something more interesting. We want to find all people who emigrated from the United States to Europe. In graph terms, we need people who:
- Were
BORN_INa location that isWITHINthe United States (possibly through several levels) - Now
LIVE_INa location that isWITHINEurope (possibly through several levels)
Here's the challenge: a person might be born in a city, which is within a state, which is within a country, which is within a continent. We don't know how many WITHIN hops we'll need to traverse.
Cypher handles this elegantly with the * operator:
MATCH
(person)-[:BORN_IN]->()-[:WITHIN*0..]->(us:Location {name:'United States'}),
(person)-[:LIVES_IN]->()-[:WITHIN*0..]->(eu:Location {name:'Europe'})
RETURN person.nameLet's break this down piece by piece.
The magic is in [:WITHIN*0..]. This means "follow zero or more WITHIN edges." It's like the * in regular expressions, match any number of times. So whether someone was born in "Idaho" (2 hops to USA) or "New York City" (3 hops through state and country), the pattern matches.
The empty parentheses () mean "some node, I don't care which one." We only need to bind the endpoints: the person and the final location.
How Cypher Executes Your Query
One of Cypher's strengths is that it's declarative. You describe the pattern you want; the database figures out how to find it efficiently.
For our emigration query, there are multiple valid strategies:
Strategy A: Start with People
- Scan all person nodes
- For each person, follow BORN_IN edges
- From the birthplace, follow WITHIN edges to see if we reach USA
- Also check if LIVES_IN leads to Europe
- Return matches
Strategy B: Start with Locations
- Find the "United States" and "Europe" nodes (using an index)
- Follow all incoming WITHIN edges to find contained locations
- Follow incoming BORN_IN edges to find people born in those places
- Similarly find people living in European locations
- Intersect the two sets
The query optimizer chooses based on indexes, data statistics, and heuristics. You don't specify this, you just write the pattern.
The Same Query in SQL: A Cautionary Tale
Can you do graph queries in SQL? Technically, yes. Should you? Let's find out.
The challenge is that SQL expects you to know how many joins you need upfront. But in graph traversal, you might follow one edge, or five, or twenty, you don't know until you explore the data.
SQL:1999 introduced recursive common table expressions (CTEs) to handle this. Here's our emigration query in PostgreSQL:
WITH RECURSIVE
-- Find all locations within the United States
in_usa(vertex_id) AS (
-- Start with the US itself
SELECT vertex_id
FROM vertices
WHERE properties->>'name' = 'United States'
UNION
-- Recursively add anything WITHIN those locations
SELECT edges.tail_vertex
FROM edges
JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
WHERE edges.label = 'within'
),
-- Find all locations within Europe
in_europe(vertex_id) AS (
SELECT vertex_id
FROM vertices
WHERE properties->>'name' = 'Europe'
UNION
SELECT edges.tail_vertex
FROM edges
JOIN in_europe ON edges.head_vertex = in_europe.vertex_id
WHERE edges.label = 'within'
),
-- Find all people born in US locations
born_in_usa(vertex_id) AS (
SELECT edges.tail_vertex
FROM edges
JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
WHERE edges.label = 'born_in'
),
-- Find all people living in European locations
lives_in_europe(vertex_id) AS (
SELECT edges.tail_vertex
FROM edges
JOIN in_europe ON edges.head_vertex = in_europe.vertex_id
WHERE edges.label = 'lives_in'
)
-- Find people who appear in BOTH sets
SELECT vertices.properties->>'name'
FROM vertices
JOIN born_in_usa ON vertices.vertex_id = born_in_usa.vertex_id
JOIN lives_in_europe ON vertices.vertex_id = lives_in_europe.vertex_id;That's 29 lines of SQL to express what Cypher does in 4 lines. Let's compare them side by side:
Understanding the SQL Version Step by Step
The SQL version is verbose, but understanding it helps clarify what graph traversal really involves.
Step 1: Build the set of US locations recursively
in_usa(vertex_id) AS (
-- Base case: start with "United States"
SELECT vertex_id FROM vertices
WHERE properties->>'name' = 'United States'
UNION
-- Recursive case: add anything WITHIN a location we've already found
SELECT edges.tail_vertex FROM edges
JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
WHERE edges.label = 'within'
)This keeps expanding until it finds all states, cities, and neighborhoods within the US.
Step 2: Do the same for Europe
Same logic, different starting point.
Step 3: Find people born in US locations
born_in_usa(vertex_id) AS (
SELECT edges.tail_vertex FROM edges
JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
WHERE edges.label = 'born_in'
)This finds all people with a BORN_IN edge pointing to any location in our in_usa set.
Step 4: Find people living in European locations
Same pattern for LIVES_IN edges to European locations.
Step 5: Intersect the sets
SELECT vertices.properties->>'name'
FROM vertices
JOIN born_in_usa ON vertices.vertex_id = born_in_usa.vertex_id
JOIN lives_in_europe ON vertices.vertex_id = lives_in_europe.vertex_id;The double JOIN ensures we only get people who appear in both sets.
A Python Implementation for Clarity
Sometimes code is clearer than queries. Here's how you might implement the same logic in Python:
def find_emigrants(graph, from_region, to_region):
"""
Find all people who were born in from_region
and now live in to_region.
"""
# Step 1: Find all locations within from_region
def get_locations_within(region_name):
"""Recursively find all locations within a region."""
locations = set()
# Find the starting region
region = graph.find_node_by_name(region_name)
if not region:
return locations
# BFS to find all locations with WITHIN edges pointing here
to_visit = [region]
while to_visit:
current = to_visit.pop()
locations.add(current.id)
# Find nodes that are WITHIN current
for edge in graph.get_incoming_edges(current, label='WITHIN'):
if edge.tail.id not in locations:
to_visit.append(edge.tail)
return locations
# Get all locations in each region
from_locations = get_locations_within(from_region)
to_locations = get_locations_within(to_region)
print(f"Locations in {from_region}: {len(from_locations)}")
print(f"Locations in {to_region}: {len(to_locations)}")
# Step 2: Find people born in from_locations
born_in_from = set()
for loc_id in from_locations:
loc = graph.get_node(loc_id)
for edge in graph.get_incoming_edges(loc, label='BORN_IN'):
born_in_from.add(edge.tail.id)
# Step 3: Find people living in to_locations
lives_in_to = set()
for loc_id in to_locations:
loc = graph.get_node(loc_id)
for edge in graph.get_incoming_edges(loc, label='LIVES_IN'):
lives_in_to.add(edge.tail.id)
# Step 4: Intersect
emigrants = born_in_from & lives_in_to
# Return names
return [graph.get_node(pid).properties['name'] for pid in emigrants]
# Usage
emigrants = find_emigrants(graph, 'United States', 'Europe')
print(f"Emigrants: {emigrants}")This Python code does exactly what the SQL does, but it's easier to follow because you can see the loops and set operations explicitly. Cypher abstracts all of this into pattern matching.
Variable-Length Paths: The Key Insight
The fundamental difference between Cypher and SQL for graph queries is how they handle paths of unknown length.
In Cypher, [:WITHIN*0..] naturally expresses "follow this edge type zero or more times." It's built into the language.
In SQL, you need recursive CTEs, which:
- Require you to define a base case and recursive case separately
- Build up sets iteratively
- Are verbose and harder to read
- Aren't as well optimized in most databases
More Cypher Examples
Let's look at a few more patterns to build your intuition:
-- Find friends of friends (2 hops)
MATCH (me:Person {name:'Alice'})-[:KNOWS]->()-[:KNOWS]->(foaf)
WHERE foaf <> me
RETURN DISTINCT foaf.name
-- Find the shortest path between two people
MATCH path = shortestPath(
(alice:Person {name:'Alice'})-[:KNOWS*]-(bob:Person {name:'Bob'})
)
RETURN path
-- Find all people within 3 hops of Alice
MATCH (alice:Person {name:'Alice'})-[:KNOWS*1..3]-(connected)
RETURN connected.name, length(path) as distance
-- Find people who live in the same city
MATCH (p1:Person)-[:LIVES_IN]->(city)<-[:LIVES_IN]-(p2:Person)
WHERE p1 <> p2
RETURN p1.name, p2.name, city.nameEach query draws a visual pattern that maps directly to what you're looking for in the graph.
When Cypher Shines (and When It Doesn't)
Cypher excels at:
- Traversing relationships (friends of friends, paths through networks)
- Pattern matching (find subgraphs matching a shape)
- Variable-length paths (ancestors, shortest routes)
- Exploring connected data
Cypher struggles with:
- Heavy aggregations and analytics (SQL is more mature here)
- Bulk data loading (dedicated import tools are better)
- Non-graph queries (if you're not using relationships, why use a graph?)
Key Takeaways
The contrast between Cypher and SQL for graph queries teaches us an important lesson: different data models need different query languages.
The Cypher approach:
- Draws patterns using intuitive arrow notation
- Handles variable-length paths naturally with
* - Lets you focus on the shape of data, not the mechanics of finding it
- Is declarative, the database optimizes execution
The SQL approach:
- Can represent graph data in tables
- Can query it using recursive CTEs
- But the result is verbose and less intuitive
- SQL wasn't designed for graph traversal
The bottom line: If your data is fundamentally about relationships and you'll be doing lots of traversal queries, a graph database with Cypher (or a similar language) will make your life much easier. You'll write less code, it'll be more readable, and the database can optimize graph-specific operations.
If you have some graph-like data but it's not central to your application, storing it in a relational database and using occasional recursive CTEs might be good enough.
Choose the tool that matches your problem.