Cypher and Graph Queries

A Language Built for Graphs

Imagine trying to describe directions to a friend. You wouldn't say "Take the road with ID 47, then the road with ID 103, then..." You'd say "Go down Main Street, turn right at the coffee shop, keep going until you hit the park." Natural, intuitive, visual.

Cypher is a query language that lets you talk to graph databases the same way. Instead of thinking in tables and joins, you draw patterns using arrows and nodes. It was created for Neo4j, the most popular graph database, and its elegance has influenced graph query languages across the industry. (Fun fact: it's named after a character from The Matrix, not cryptographic ciphers.)

The Arrow Notation: Drawing Your Data

The core idea of Cypher is beautifully simple: you describe the pattern you're looking for using ASCII art. Nodes are wrapped in parentheses, and relationships are arrows connecting them.

(person)-[:KNOWS]->(friend)

This reads exactly like it looks: a person who KNOWS a friend. The arrow shows direction, person knows friend, not necessarily the other way around. The square brackets contain the relationship type.

Let's build the example from our previous discussion: Lucy was born in Idaho, which is within the United States, which is within North America.

CREATE
  (NAmerica:Location {name:'North America', type:'continent'}),
  (USA:Location      {name:'United States', type:'country'}),
  (Idaho:Location    {name:'Idaho',         type:'state'}),
  (Lucy:Person       {name:'Lucy'}),
  (Idaho) -[:WITHIN]-> (USA) -[:WITHIN]-> (NAmerica),
  (Lucy)  -[:BORN_IN]-> (Idaho)

Look how naturally this reads. Each line either creates a node with properties, or draws a relationship between nodes. The chain (Idaho) -[:WITHIN]-> (USA) -[:WITHIN]-> (NAmerica) creates two edges in one flowing expression.

Loading diagram...

Asking Questions with MATCH

Creating data is nice, but the real power of Cypher is in querying. The MATCH clause uses the same arrow notation to find patterns in your graph.

Let's start simple. Find all people and where they were born:

MATCH (person:Person)-[:BORN_IN]->(place)
RETURN person.name, place.name

This says: find any node labeled Person that has a BORN_IN relationship pointing to another node. Return both names.

Loading diagram...

The pattern acts like a template. Cypher finds all subgraphs that match the shape you've drawn, binds the matching nodes to your variable names, and returns what you ask for.

The Emigration Query: Following Chains

Now let's tackle something more interesting. We want to find all people who emigrated from the United States to Europe. In graph terms, we need people who:

Were BORN_IN a location that is WITHIN the United States (possibly through several levels)
Now LIVE_IN a location that is WITHIN Europe (possibly through several levels)

Here's the challenge: a person might be born in a city, which is within a state, which is within a country, which is within a continent. We don't know how many WITHIN hops we'll need to traverse.

Cypher handles this elegantly with the * operator:

MATCH
  (person)-[:BORN_IN]->()-[:WITHIN*0..]->(us:Location {name:'United States'}),
  (person)-[:LIVES_IN]->()-[:WITHIN*0..]->(eu:Location {name:'Europe'})
RETURN person.name

Let's break this down piece by piece.

Loading diagram...

The magic is in [:WITHIN*0..]. This means "follow zero or more WITHIN edges." It's like the * in regular expressions, match any number of times. So whether someone was born in "Idaho" (2 hops to USA) or "New York City" (3 hops through state and country), the pattern matches.

The empty parentheses () mean "some node, I don't care which one." We only need to bind the endpoints: the person and the final location.

How Cypher Executes Your Query

One of Cypher's strengths is that it's declarative. You describe the pattern you want; the database figures out how to find it efficiently.

For our emigration query, there are multiple valid strategies:

Strategy A: Start with People

Scan all person nodes
For each person, follow BORN_IN edges
From the birthplace, follow WITHIN edges to see if we reach USA
Also check if LIVES_IN leads to Europe
Return matches

Strategy B: Start with Locations

Find the "United States" and "Europe" nodes (using an index)
Follow all incoming WITHIN edges to find contained locations
Follow incoming BORN_IN edges to find people born in those places
Similarly find people living in European locations
Intersect the two sets

The query optimizer chooses based on indexes, data statistics, and heuristics. You don't specify this, you just write the pattern.

Loading diagram...

The Same Query in SQL: A Cautionary Tale

Can you do graph queries in SQL? Technically, yes. Should you? Let's find out.

The challenge is that SQL expects you to know how many joins you need upfront. But in graph traversal, you might follow one edge, or five, or twenty, you don't know until you explore the data.

SQL:1999 introduced recursive common table expressions (CTEs) to handle this. Here's our emigration query in PostgreSQL:

WITH RECURSIVE
  -- Find all locations within the United States
  in_usa(vertex_id) AS (
    -- Start with the US itself
    SELECT vertex_id 
    FROM vertices 
    WHERE properties->>'name' = 'United States'
    
    UNION
    
    -- Recursively add anything WITHIN those locations
    SELECT edges.tail_vertex 
    FROM edges
    JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
    WHERE edges.label = 'within'
  ),
  
  -- Find all locations within Europe
  in_europe(vertex_id) AS (
    SELECT vertex_id 
    FROM vertices 
    WHERE properties->>'name' = 'Europe'
    
    UNION
    
    SELECT edges.tail_vertex 
    FROM edges
    JOIN in_europe ON edges.head_vertex = in_europe.vertex_id
    WHERE edges.label = 'within'
  ),
  
  -- Find all people born in US locations
  born_in_usa(vertex_id) AS (
    SELECT edges.tail_vertex 
    FROM edges
    JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
    WHERE edges.label = 'born_in'
  ),
  
  -- Find all people living in European locations
  lives_in_europe(vertex_id) AS (
    SELECT edges.tail_vertex 
    FROM edges
    JOIN in_europe ON edges.head_vertex = in_europe.vertex_id
    WHERE edges.label = 'lives_in'
  )
 
-- Find people who appear in BOTH sets
SELECT vertices.properties->>'name'
FROM vertices
JOIN born_in_usa ON vertices.vertex_id = born_in_usa.vertex_id
JOIN lives_in_europe ON vertices.vertex_id = lives_in_europe.vertex_id;

That's 29 lines of SQL to express what Cypher does in 4 lines. Let's compare them side by side:

Loading diagram...

Understanding the SQL Version Step by Step

The SQL version is verbose, but understanding it helps clarify what graph traversal really involves.

Step 1: Build the set of US locations recursively

in_usa(vertex_id) AS (
    -- Base case: start with "United States"
    SELECT vertex_id FROM vertices 
    WHERE properties->>'name' = 'United States'
    
    UNION
    
    -- Recursive case: add anything WITHIN a location we've already found
    SELECT edges.tail_vertex FROM edges
    JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
    WHERE edges.label = 'within'
)

This keeps expanding until it finds all states, cities, and neighborhoods within the US.

Loading diagram...

Step 2: Do the same for Europe

Same logic, different starting point.

Step 3: Find people born in US locations

born_in_usa(vertex_id) AS (
    SELECT edges.tail_vertex FROM edges
    JOIN in_usa ON edges.head_vertex = in_usa.vertex_id
    WHERE edges.label = 'born_in'
)

This finds all people with a BORN_IN edge pointing to any location in our in_usa set.

Step 4: Find people living in European locations

Same pattern for LIVES_IN edges to European locations.

Step 5: Intersect the sets

SELECT vertices.properties->>'name'
FROM vertices
JOIN born_in_usa ON vertices.vertex_id = born_in_usa.vertex_id
JOIN lives_in_europe ON vertices.vertex_id = lives_in_europe.vertex_id;

The double JOIN ensures we only get people who appear in both sets.

A Python Implementation for Clarity

Sometimes code is clearer than queries. Here's how you might implement the same logic in Python:

def find_emigrants(graph, from_region, to_region):
    """
    Find all people who were born in from_region 
    and now live in to_region.
    """
    
    # Step 1: Find all locations within from_region
    def get_locations_within(region_name):
        """Recursively find all locations within a region."""
        locations = set()
        
        # Find the starting region
        region = graph.find_node_by_name(region_name)
        if not region:
            return locations
        
        # BFS to find all locations with WITHIN edges pointing here
        to_visit = [region]
        while to_visit:
            current = to_visit.pop()
            locations.add(current.id)
            
            # Find nodes that are WITHIN current
            for edge in graph.get_incoming_edges(current, label='WITHIN'):
                if edge.tail.id not in locations:
                    to_visit.append(edge.tail)
        
        return locations
    
    # Get all locations in each region
    from_locations = get_locations_within(from_region)
    to_locations = get_locations_within(to_region)
    
    print(f"Locations in {from_region}: {len(from_locations)}")
    print(f"Locations in {to_region}: {len(to_locations)}")
    
    # Step 2: Find people born in from_locations
    born_in_from = set()
    for loc_id in from_locations:
        loc = graph.get_node(loc_id)
        for edge in graph.get_incoming_edges(loc, label='BORN_IN'):
            born_in_from.add(edge.tail.id)
    
    # Step 3: Find people living in to_locations
    lives_in_to = set()
    for loc_id in to_locations:
        loc = graph.get_node(loc_id)
        for edge in graph.get_incoming_edges(loc, label='LIVES_IN'):
            lives_in_to.add(edge.tail.id)
    
    # Step 4: Intersect
    emigrants = born_in_from & lives_in_to
    
    # Return names
    return [graph.get_node(pid).properties['name'] for pid in emigrants]
 
 
# Usage
emigrants = find_emigrants(graph, 'United States', 'Europe')
print(f"Emigrants: {emigrants}")

This Python code does exactly what the SQL does, but it's easier to follow because you can see the loops and set operations explicitly. Cypher abstracts all of this into pattern matching.

Variable-Length Paths: The Key Insight

The fundamental difference between Cypher and SQL for graph queries is how they handle paths of unknown length.

In Cypher, [:WITHIN*0..] naturally expresses "follow this edge type zero or more times." It's built into the language.

In SQL, you need recursive CTEs, which:

Require you to define a base case and recursive case separately
Build up sets iteratively
Are verbose and harder to read
Aren't as well optimized in most databases

Loading diagram...

More Cypher Examples

Let's look at a few more patterns to build your intuition:

-- Find friends of friends (2 hops)
MATCH (me:Person {name:'Alice'})-[:KNOWS]->()-[:KNOWS]->(foaf)
WHERE foaf <> me
RETURN DISTINCT foaf.name
 
-- Find the shortest path between two people
MATCH path = shortestPath(
  (alice:Person {name:'Alice'})-[:KNOWS*]-(bob:Person {name:'Bob'})
)
RETURN path
 
-- Find all people within 3 hops of Alice
MATCH (alice:Person {name:'Alice'})-[:KNOWS*1..3]-(connected)
RETURN connected.name, length(path) as distance
 
-- Find people who live in the same city
MATCH (p1:Person)-[:LIVES_IN]->(city)<-[:LIVES_IN]-(p2:Person)
WHERE p1 <> p2
RETURN p1.name, p2.name, city.name

Each query draws a visual pattern that maps directly to what you're looking for in the graph.

When Cypher Shines (and When It Doesn't)

Cypher excels at:

Traversing relationships (friends of friends, paths through networks)
Pattern matching (find subgraphs matching a shape)
Variable-length paths (ancestors, shortest routes)
Exploring connected data

Cypher struggles with:

Heavy aggregations and analytics (SQL is more mature here)
Bulk data loading (dedicated import tools are better)
Non-graph queries (if you're not using relationships, why use a graph?)

Loading diagram...

Key Takeaways

The contrast between Cypher and SQL for graph queries teaches us an important lesson: different data models need different query languages.

The Cypher approach:

Draws patterns using intuitive arrow notation
Handles variable-length paths naturally with *
Lets you focus on the shape of data, not the mechanics of finding it
Is declarative, the database optimizes execution

The SQL approach:

Can represent graph data in tables
Can query it using recursive CTEs
But the result is verbose and less intuitive
SQL wasn't designed for graph traversal

The bottom line: If your data is fundamentally about relationships and you'll be doing lots of traversal queries, a graph database with Cypher (or a similar language) will make your life much easier. You'll write less code, it'll be more readable, and the database can optimize graph-specific operations.

If you have some graph-like data but it's not central to your application, storing it in a relational database and using occasional recursive CTEs might be good enough.

Choose the tool that matches your problem.