Entity resolution (ER) lies at the heart of many mission-critical applications—fraud detection, master data management, customer 360 platforms, compliance workflows, and beyond. At scale, however, ER systems often collapse under the weight of their own complexity. Traditional models either discard valuable match-level metadata or generate massive volumes of redundant data.

Tilores, a leading ER platform, is tackling this head-on by adopting an innovative and mathematically elegant solution: Clique-Based Graph Compression (CBGC). In a recent technical breakdown, Stefan Berkner from Tilores explains how CBGC delivers up to 99.7% reduction in edge storage, dramatically improving both performance and data traceability.


The Graph Model Behind Tilores

Unlike many ER or MDM platforms that treat entities as opaque objects, Tilores represents each entity as a graph, where:

  • Nodes = Records
  • Edges = Match relationships between records, annotated with rule IDs

For instance, if record A matches record B via rule R1, an edge “A:B:R1” is stored. If they also match under rule R2, a second edge “A:B:R2” is stored.

This design provides:

  • Traceability: Users can inspect precisely why records were grouped.
  • Auditability: Match rules and data similarity metrics are fully visible.
  • Maintainability: Edges are required for recalculation after deletions or rule updates.

But there’s a catch.


The Scaling Nightmare: Quadratic Edge Growth

As records accumulate, so do the match relationships. In dense graphs—especially those where many records match each other—a single entity can generate an explosion of edges.

The maximum number of pairwise connections in a complete entity of n records is:

n × (n – 1) / 2

This means:

  • 3 records = 3 edges
  • 100 records = 4,950 edges
  • 1,000 records = 499,500 edges

Such quadratic growth isn’t just inefficient—it’s unsustainable for large-scale deployments.


Enter Clique-Based Graph Compression (CBGC)

A clique is a complete subgraph: a group of nodes where every node is connected to every other. Rather than storing each of those n(n-1)/2 edges, Tilores compresses them into a single clique object, storing only:

  • The list of nodes
  • The match rule(s) associated with that clique

Example:

A triangle (3-node clique) that would normally require 3 edges is instead represented as:
{nodes: [A, B, C], rule: R1}

At larger scales, the savings are exponential. One real-world graph with 38 edges was compressed into three cliques and 10 individual edges.


Storage Savings and Real-World Impact

In a production system, Tilores applied CBGC to dense entity graphs and achieved:

  • 99.7% edge storage reduction
  • Replacing hundreds of thousands of edges with just hundreds of clique definitions

This not only slashes storage costs but also reduces in-memory graph complexity, enabling faster response times and lower operational overhead.


Performance Boosts Beyond Storage

CBGC isn’t just about saving disk space. It transforms graph operations like:

  • Record deletion: When nodes are removed (e.g. GDPR “right to be forgotten”), recalculating the graph becomes much faster.
  • Entity splitting: When a critical edge is deleted, the system can re-evaluate graph components faster by treating cliques as atomic blocks.
  • Connected components detection: Normally an O(E) operation, now significantly accelerated due to fewer explicit edges.

CBGC enables transitive inference, where not all edges need to be traversed or recalculated—just enough to reestablish reachability between nodes.


But Isn’t Clique Detection Expensive?

Yes, and this is the trade-off. Finding maximum cliques is NP-hard, meaning computationally infeasible for large graphs. But Tilores sidesteps this with:

  • Greedy approximations
  • Selective recalculation: CBGC is only applied when an entity surpasses a configurable edge threshold
  • Hybrid architecture: Combining uncompressed edges with cliques for graphs that aren’t fully compressible

This pragmatic approach ensures optimal storage efficiency without paralyzing the system.


Beyond Cliques: Other Compression Patterns

Tilores envisions expanding CBGC by identifying and leveraging other structural patterns in entity graphs:

  • Stars: One central node connected to many outer nodes
  • Paths: Sequential chains of records
  • Communities: Dense subgraphs with some missing edges

Such representations could further boost compression and even improve explainability, helping analysts understand how entities are structured at a glance.


Why This Matters

Modern ER systems are expected to handle:

  • Millions of records
  • Dynamic rule updates
  • Regulatory data deletions
  • Real-time matching and resolution

Storing and processing complete graphs naïvely is no longer viable. Clique-based compression delivers not just storage savings, but real-time operational advantages that translate into better user experience, regulatory compliance, and system scalability.


Final Thoughts

Clique-Based Graph Compression is a powerful application of graph theory to solve real-world data problems. Tilores shows that by understanding the structure of relationships—not just their content—we can build smarter, faster, and more maintainable systems.

For any organization dealing with identity resolution, fraud networks, or master data unification at scale, CBGC offers a breakthrough in how entity data can be stored and analyzed.

📘 Read the full article by Stefan Berkner on: Tilores.io
🔍 Explore more on entity graphs and compression techniques in Tilores’ developer docs.

Scroll to Top