Revolutionizing Entity Resolution: How Tilores Uses Clique-Based Graph Compression to Solve Storage and Scalability Challenges

Published 05/15/2025

X (Twitter) Facebook Pinterest LinkedIn Email WhatsApp

Entity resolution (ER) lies at the heart of many mission-critical applications—fraud detection, master data management, customer 360 platforms, compliance workflows, and beyond. At scale, however, ER systems often collapse under the weight of their own complexity. Traditional models either discard valuable match-level metadata or generate massive volumes of redundant data.

Tilores, a leading ER platform, is tackling this head-on by adopting an innovative and mathematically elegant solution: Clique-Based Graph Compression (CBGC). In a recent technical breakdown, Stefan Berkner from Tilores explains how CBGC delivers up to 99.7% reduction in edge storage, dramatically improving both performance and data traceability.

The Graph Model Behind Tilores

Unlike many ER or MDM platforms that treat entities as opaque objects, Tilores represents each entity as a graph, where:

Nodes = Records
Edges = Match relationships between records, annotated with rule IDs

For instance, if record A matches record B via rule R1, an edge “A:B:R1” is stored. If they also match under rule R2, a second edge “A:B:R2” is stored.

This design provides:

Traceability: Users can inspect precisely why records were grouped.
Auditability: Match rules and data similarity metrics are fully visible.
Maintainability: Edges are required for recalculation after deletions or rule updates.

But there’s a catch.

The Scaling Nightmare: Quadratic Edge Growth

As records accumulate, so do the match relationships. In dense graphs—especially those where many records match each other—a single entity can generate an explosion of edges.

The maximum number of pairwise connections in a complete entity of n records is:

n × (n – 1) / 2

This means:

3 records = 3 edges
100 records = 4,950 edges
1,000 records = 499,500 edges

Such quadratic growth isn’t just inefficient—it’s unsustainable for large-scale deployments.

Enter Clique-Based Graph Compression (CBGC)

A clique is a complete subgraph: a group of nodes where every node is connected to every other. Rather than storing each of those n(n-1)/2 edges, Tilores compresses them into a single clique object, storing only:

The list of nodes
The match rule(s) associated with that clique

Example:

A triangle (3-node clique) that would normally require 3 edges is instead represented as:
{nodes: [A, B, C], rule: R1}

At larger scales, the savings are exponential. One real-world graph with 38 edges was compressed into three cliques and 10 individual edges.

Storage Savings and Real-World Impact

In a production system, Tilores applied CBGC to dense entity graphs and achieved:

99.7% edge storage reduction
Replacing hundreds of thousands of edges with just hundreds of clique definitions

This not only slashes storage costs but also reduces in-memory graph complexity, enabling faster response times and lower operational overhead.

Performance Boosts Beyond Storage

CBGC isn’t just about saving disk space. It transforms graph operations like:

Record deletion: When nodes are removed (e.g. GDPR “right to be forgotten”), recalculating the graph becomes much faster.
Entity splitting: When a critical edge is deleted, the system can re-evaluate graph components faster by treating cliques as atomic blocks.
Connected components detection: Normally an O(E) operation, now significantly accelerated due to fewer explicit edges.

CBGC enables transitive inference, where not all edges need to be traversed or recalculated—just enough to reestablish reachability between nodes.

But Isn’t Clique Detection Expensive?

Yes, and this is the trade-off. Finding maximum cliques is NP-hard, meaning computationally infeasible for large graphs. But Tilores sidesteps this with:

Greedy approximations
Selective recalculation: CBGC is only applied when an entity surpasses a configurable edge threshold
Hybrid architecture: Combining uncompressed edges with cliques for graphs that aren’t fully compressible

This pragmatic approach ensures optimal storage efficiency without paralyzing the system.

Beyond Cliques: Other Compression Patterns

Tilores envisions expanding CBGC by identifying and leveraging other structural patterns in entity graphs:

Stars: One central node connected to many outer nodes
Paths: Sequential chains of records
Communities: Dense subgraphs with some missing edges

Such representations could further boost compression and even improve explainability, helping analysts understand how entities are structured at a glance.

Why This Matters

Modern ER systems are expected to handle:

Millions of records
Dynamic rule updates
Regulatory data deletions
Real-time matching and resolution

Storing and processing complete graphs naïvely is no longer viable. Clique-based compression delivers not just storage savings, but real-time operational advantages that translate into better user experience, regulatory compliance, and system scalability.

Final Thoughts

Clique-Based Graph Compression is a powerful application of graph theory to solve real-world data problems. Tilores shows that by understanding the structure of relationships—not just their content—we can build smarter, faster, and more maintainable systems.

For any organization dealing with identity resolution, fraud networks, or master data unification at scale, CBGC offers a breakthrough in how entity data can be stored and analyzed.

📘 Read the full article by Stefan Berkner on: Tilores.io
🔍 Explore more on entity graphs and compression techniques in Tilores’ developer docs.

X (Twitter) Facebook Pinterest LinkedIn Email WhatsApp

Revolutionizing Entity Resolution: How Tilores Uses Clique-Based Graph Compression to Solve Storage and Scalability Challenges

The Graph Model Behind Tilores

The Scaling Nightmare: Quadratic Edge Growth

Enter Clique-Based Graph Compression (CBGC)

Example:

Storage Savings and Real-World Impact

Performance Boosts Beyond Storage

But Isn’t Clique Detection Expensive?

Beyond Cliques: Other Compression Patterns

Why This Matters

Final Thoughts

Related articles

GitHub Copilot in Visual Studio Code: A Comprehensive Guide to Boost Your Productivity

Alternatives to Raspberry Pi: Is the VIM1S the Solution for Self-Hosting?

T-Pot: The All-In-One Multi-Honeypot Platform for Cybersecurity

Proxmox vs VMware ESXi: Performance Battle on NVMe/TCP Shows Clear Winner

How to Automatically Rotate WordPress SALT Keys and Enhance Site Security

Wine 9.22: New Release Enhances Display Virtualization and Wayland Support

Ultimate Web Server Benchmark: Apache, NGINX, LiteSpeed, OpenLiteSpeed, Caddy, and Lighttpd Compared

The Most Reliable Hard Drives of 2024 According to Backblaze: Analysis of Over 300,000 Units

RustDesk Server: A Secure, Self-Hosted Alternative for Remote Desktop Access

Data Centers Prove Their Resilience During the April 28 Power Outage Across the Iberian Peninsula