Network Analysis Queries

True graph algorithms that are impossible with SQL

πŸ’‘ Why These Queries Are Different

Previous Queries (Graph-Shaped SQL)

  • β€’ Filter nodes by properties
  • β€’ Aggregate across relationships
  • β€’ Join tables via foreign keys
  • β€’ Fixed-depth traversals
  • β†’ Could be done with SQL JOINs (just uglier)

Network Queries (True Graph Analysis)

  • β€’ Variable-length path finding
  • β€’ Centrality & influence measures
  • β€’ Connectivity & resilience analysis
  • β€’ Community detection
  • β€’ Propagation & cascade modeling
  • β†’ Impossible or impractical in SQL
πŸ›€οΈ

Path Analysis

Find routes through the network. Shortest paths, all paths, weighted paths.

🎯

Centrality

Identify important nodes. Most connected, most influential, most critical.

πŸ”—

Connectivity

Network resilience. Single points of failure, redundancy, bridges.

🌊

Propagation

How effects ripple. Disruption cascades, influence spread.

🏘️

Communities

Natural clusters. Products that group, users that cluster.

πŸ”€

Projections

Derived networks. User-user similarity via products, etc.

πŸ›€οΈ Section 1: Path Analysis

Paths are the fundamental unit of graph analysis. SQL can do fixed JOINs, but variable-length path finding is a graph-native operation.

1.1 Shortest Path: User to Origin

"What's the shortest supply chain path from a customer to a manufacturing origin?"

πŸ’‘ Why this matters: Reveals the minimum number of business relationships between customer and source.

MATCH path = shortestPath(
  (u:User {userId: 'USR-001'})-[*]-(o:Origin {countryCode: 'JP'})
)
RETURN path, length(path) as pathLength
1.3 Variable Length: N-Hop Neighborhood Key concept

"What can I reach within 3 hops from this supplier?"

πŸ’‘ Why this matters: The [*1..3] syntax is impossible in SQL - it's recursive traversal with variable depth.

MATCH path = (s:Supplier {name: 'Saigon Kitchenware Co'})-[*1..3]-(connected)
RETURN DISTINCT labels(connected)[0] as NodeType, 
       COUNT(DISTINCT connected) as Count

🎯 Section 2: Centrality Analysis

Centrality measures identify the most important nodes. Different metrics answer different questions about "importance".

Types of Centrality:

  • Degree: Most connections (hub nodes)
  • Betweenness: Most paths pass through (bridges)
  • Closeness: Shortest avg distance to all others
  • PageRank: Connected to other important nodes
2.2 Supplier Criticality Score Business critical

"If this supplier fails, how many downstream entities are affected?"

πŸ’‘ Why this matters: Combines graph traversal with business metrics to quantify network importance.

MATCH (s:Supplier)-[:SUPPLIES]->(p:Product)
OPTIONAL MATCH (p)<-[:PURCHASED]-(u:User)
OPTIONAL MATCH (p)-[:STOCKED_AT]->(w:Warehouse)
WITH s,
     COUNT(DISTINCT p) as products,
     COUNT(DISTINCT u) as affectedUsers,
     COUNT(DISTINCT w) as affectedWarehouses,
     SUM(p.annualRevenue) as revenueAtRisk
RETURN s.name as Supplier,
       products as ProductCount,
       affectedUsers as UsersAffected,
       affectedWarehouses as WarehousesAffected,
       revenueAtRisk as RevenueAtRisk,
       products + affectedUsers + affectedWarehouses as CriticalityScore
ORDER BY CriticalityScore DESC
2.4 Origin Network Reach

"How far does each origin's influence reach through the network?"

MATCH (o:Origin)<-[:LOCATED_IN]-(s:Supplier)-[:SUPPLIES]->(p:Product)
OPTIONAL MATCH (p)<-[:PURCHASED]-(u:User)
OPTIONAL MATCH (p)-[:STOCKED_AT]->(w:Warehouse)-[:FULFILLS]->(c:Channel)
WITH o,
     COUNT(DISTINCT s) as suppliers,
     COUNT(DISTINCT p) as products,
     COUNT(DISTINCT u) as customers,
     COUNT(DISTINCT w) as warehouses,
     COUNT(DISTINCT c) as channels
RETURN o.countryName as Origin,
       suppliers + products + customers + warehouses + channels as NetworkReach,
       suppliers, products, customers, warehouses, channels
ORDER BY NetworkReach DESC

πŸ”— Section 3: Connectivity Analysis

Understanding how the network holds together and where it might break.

3.1 Articulation Points: Single Points of Failure Risk analysis

"Which suppliers are single points of failure? (Products with only ONE supplier)"

πŸ’‘ Network concept: Articulation points are nodes whose removal disconnects the graph.

MATCH (p:Product)<-[:SUPPLIES]-(s:Supplier)
WITH p, COLLECT(s) as suppliers, COUNT(s) as supplierCount
WHERE supplierCount = 1
WITH suppliers[0] as singleSupplier, COLLECT(p) as dependentProducts
RETURN singleSupplier.name as CriticalSupplier,
       SIZE(dependentProducts) as ProductsAtRisk,
       [prod in dependentProducts | prod.sku][0..10] as SampleSKUs,
       REDUCE(total = 0, prod in dependentProducts | total + prod.annualRevenue) as RevenueAtRisk
ORDER BY SIZE(dependentProducts) DESC
3.4 Redundancy Check: Multiple Paths

"Which products have redundant supply chains?"

MATCH (p:Product)<-[:SUPPLIES]-(s:Supplier)
WITH p, COUNT(DISTINCT s) as supplierCount, COLLECT(DISTINCT s.name) as suppliers
MATCH (p)-[:STOCKED_AT]->(w:Warehouse)
WITH p, supplierCount, suppliers, COUNT(DISTINCT w) as warehouseCount
RETURN p.sku as SKU, p.name as Product,
       supplierCount as SupplierRedundancy,
       warehouseCount as WarehouseRedundancy,
       supplierCount * warehouseCount as TotalPathRedundancy
ORDER BY TotalPathRedundancy DESC

🌊 Section 4: Propagation Analysis

How do effects (disruptions, influence) spread through the network? This is TRUE network modeling.

4.1 Disruption Cascade: Vietnam Goes Offline The Vietnam Question!

"Trace the full impact cascade of Vietnam supply disruption through all network layers"

πŸ’‘ Why this is network analysis: We're following the ripple effect across 4 levels of the network - this IS propagation modeling.

// 4.1 DISRUPTION CASCADE: Vietnam goes offline
// "Trace the full impact cascade of Vietnam supply disruption"

// Level 0: Origin
MATCH (o:Origin {countryCode: 'VN'})

// Level 1: Suppliers affected
MATCH (o)<-[:LOCATED_IN]-(s:Supplier)
WITH o, COLLECT(s) as affectedSuppliers

// Level 2: Products affected
UNWIND affectedSuppliers as s
MATCH (s)-[:SUPPLIES]->(p:Product)
WITH o, affectedSuppliers, COLLECT(DISTINCT p) as affectedProducts

// Level 3: Warehouses affected
UNWIND affectedProducts as p
MATCH (p)-[:STOCKED_AT]->(w:Warehouse)
WITH o, affectedSuppliers, affectedProducts, COLLECT(DISTINCT w) as affectedWarehouses

// Level 4: Users affected
UNWIND affectedProducts as p
MATCH (p)<-[:PURCHASED]-(u:User)
WITH o, affectedSuppliers, affectedProducts, affectedWarehouses, COLLECT(DISTINCT u) as affectedUsers

RETURN 'Vietnam Disruption Cascade' as Scenario,
       SIZE(affectedSuppliers) as L1_Suppliers,
       SIZE(affectedProducts) as L2_Products, 
       SIZE(affectedWarehouses) as L3_Warehouses,
       SIZE(affectedUsers) as L4_Customers,
       SIZE(affectedSuppliers) + SIZE(affectedProducts) + SIZE(affectedWarehouses) + SIZE(affectedUsers) as TotalImpact
4.4 Influence Spread: Similar Users via Products

"From one user's purchases, find others with similar patterns"

MATCH (u:User {userId: 'USR-001'})-[:PURCHASED]->(p:Product)
WITH u, COLLECT(p) as userProducts

UNWIND userProducts as prod
MATCH (prod)<-[:PURCHASED]-(other:User)
WHERE other <> u
WITH u, other, COUNT(DISTINCT prod) as sharedProducts, COLLECT(DISTINCT prod.name) as commonProducts

RETURN other.userId as SimilarUser,
       other.segmentName as Segment,
       sharedProducts as SharedPurchases,
       commonProducts as CommonProducts
ORDER BY sharedProducts DESC
LIMIT 10

πŸ”€ Section 5: Bipartite Projections

Creating derived networks by projecting through shared connections. This is fundamental to recommendation systems and similarity analysis.

What is a Bipartite Projection?

A bipartite graph has two types of nodes (e.g., Users and Products). A projection creates a new graph with only one type, where connections are based on shared connections to the other type.

User₁ β†’ Product ← Userβ‚‚
becomes:
User₁ ←→ Userβ‚‚ (connected via shared product)
6.1 User-User Similarity Network Collaborative filtering

"Create implicit connections between users based on purchase overlap (Jaccard similarity)"

MATCH (u1:User)-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(u2:User)
WHERE u1.userId < u2.userId
WITH u1, u2, COUNT(DISTINCT p) as sharedProducts
     
MATCH (u1)-[:PURCHASED]->(p1:Product)
WITH u1, u2, sharedProducts, COUNT(DISTINCT p1) as u1Products
MATCH (u2)-[:PURCHASED]->(p2:Product)
WITH u1, u2, sharedProducts, u1Products, COUNT(DISTINCT p2) as u2Products

// Jaccard = intersection / union
WITH u1, u2, sharedProducts,
     toFloat(sharedProducts) / (u1Products + u2Products - sharedProducts) as jaccardSimilarity
WHERE jaccardSimilarity > 0.2

RETURN u1.userId as User1, u2.userId as User2,
       round(jaccardSimilarity * 100) / 100 as Similarity,
       sharedProducts as SharedProducts
ORDER BY jaccardSimilarity DESC
6.3 Supplier-Supplier via Shared Customers

"Which suppliers serve similar customer bases? (2-hop projection)"

MATCH (s1:Supplier)-[:SUPPLIES]->(p1:Product)<-[:PURCHASED]-(u:User)-[:PURCHASED]->(p2:Product)<-[:SUPPLIES]-(s2:Supplier)
WHERE s1.supplierId < s2.supplierId
WITH s1, s2, COUNT(DISTINCT u) as sharedCustomers
WHERE sharedCustomers >= 2
RETURN s1.name as Supplier1, s2.name as Supplier2,
       sharedCustomers as SharedCustomerBase
ORDER BY sharedCustomers DESC

πŸ‘οΈ Visual Network Queries

These queries return graph structures that visualize beautifully in Neo4j Browser.

VIS 1 Full Supply Chain from Origin Great for demos
MATCH path = (o:Origin {countryCode: 'VN'})<-[:LOCATED_IN]-(s:Supplier)-[:SUPPLIES]->(p:Product)
OPTIONAL MATCH userPath = (p)<-[:PURCHASED]-(u:User)
OPTIONAL MATCH warehousePath = (p)-[:STOCKED_AT]->(w:Warehouse)
RETURN path, userPath, warehousePath
VIS 2 User's Complete Network Neighborhood
MATCH (u:User {userId: 'USR-007'})-[r]-(connected)
OPTIONAL MATCH (connected)-[r2]-(secondLevel)
WHERE NOT secondLevel:User
RETURN u, r, connected, r2, secondLevel

🎯 The Key Takeaway

These queries demonstrate that our graph isn't just a "different way to store data" - it enables analysis patterns that are fundamentally impossible in relational databases:

πŸ“š Going Further: Neo4j Graph Data Science

For production network analysis, Neo4j's GDS library provides optimized algorithms:

These queries demonstrate the concepts; GDS provides the scale.