For years, plenty of architecture discussions have leaned on a convenient idea: that SQL databases “don’t scale,” and that beyond a certain size the natural move is to abandon the relational model. In practice, what breaks first is rarely SQL itself. What breaks is the operational discipline around the database: schema design, query hygiene, connection limits, replica management, backup-and-restore posture, and the ability to absorb traffic spikes without turning them into incidents.

For systems administrators, the most useful reference cases aren’t the ones that brag about novelty. They’re the ones that demonstrate repeatable patterns. Shopify, for example, has described how it keeps its “distributed monolith” on MySQL, using a sharded architecture designed to scale under peaks like Black Friday / Cyber Monday: hundreds of shards, one writer per shard, and multiple replicas for reads and resiliency—deployed across thousands of virtual machines to sustain growth and campaigns.

In parallel, Meta has long taken a pragmatic approach: MySQL at the core, with selective changes where it matters. Its public fork includes MyRocks, a RocksDB-based storage engine intended to improve efficiency (especially under write-heavy workloads or when storage pressure is a major factor).

And for anyone who equates “scaling” with “distribution,” the ecosystem also offers SQL-first paths: Vitess, now under the CNCF umbrella, was created to scale MySQL horizontally (sharding, routing, and large-scale operations) and became widely adopted because it addressed that exact pain at high volume.

Even outside the marquee platforms—yet still operating under brutal production requirements—the lesson holds. Wikimedia documents its use of MariaDB in production, along with the operational tooling and practices that come with it.

For a sysadmin-focused outlet, the conclusion is less philosophical and more actionable: SQL scales if you operate it well. And “operate it well” isn’t a slogan—it’s a checklist of controls and decisions you can audit.

What Usually Scales First: Habits, Not Technology

In production, the jump from “it’s fine” to “it’s down” is rarely caused by the SQL engine alone. The usual triggers are familiar:

  • Connection explosions (poorly sized pools, aggressive timeouts, retry storms).
  • Unbounded query growth (missing indexes, expensive joins, N+1 patterns).
  • Hotspots (a central table or a small set of rows concentrating writes).
  • Backups that don’t restore (copies exist, but recovery is untested).
  • Replication without visibility (hidden lag, manual promotions under pressure).

That’s why in “SQL vs. NoSQL” debates, sysadmins benefit from reframing the question: what pattern do I need to meet my SLOs and my operational model?

Comparison Table: SQL Patterns That Actually Work Under Real Load

Operational needRecommended patternTypical fitSysadmin advantagesRisks / trade-offs
More reads without changing the appRead replicas + RW/RO separationLight analytics, catalogs, feedsReduces primary load; incremental scalingReplication lag; inconsistent reads if not designed carefully
Predictable spikes (campaigns)Capacity planning + temporary scaling + cachesE-commerce, media, SaaSRisk control; clear runbooksInfra cost if overprovisioned
Sustained data/IO growthPartitioning + archivingLogs, events, long-term historyKeeps “hot” tables small; easier maintenanceQuery complexity; partition ops overhead
Write ceiling on a single primarySharding by a business keyMulti-tenant, large catalogsParallelizes writes; reduces hotspotsComplexity for cross-shard transactions and global reporting
Large-scale ops (many shards)Orchestration layer (e.g., Vitess)High-volume platformsStandardizes routing/ops; automates recurring workLearning curve; more critical components
Cost/licensing + MySQL compatibilityMariaDB (case-dependent)MySQL-compatible environmentsBroad ecosystem; clustering optionsCompatibility drift; internal standardization decisions

What to Actually Copy from the Big Players

1) Separate performance from heroics.
Shopify doesn’t treat its database as a sacred artifact. It treats it as a system made of repeatable parts: shards, clear roles (writer/replicas), and a playbook that gets pressure-tested during major campaigns. In production terms, that means runbooks, rehearsals, and automation—so scalability doesn’t depend on “the one person who knows everything.”

2) Change only what you must.
MyRocks is a good example of pragmatism: it’s not “reinvent the database,” it’s “adjust the storage engine to relieve a specific bottleneck,” while keeping the SQL model.

3) Operate for the worst day, not the average day.
Architecture value isn’t proven on a quiet Tuesday; it’s proven the day traffic multiplies and the business won’t accept degradation. In that scenario, sysadmins need three things: observability (lag, QPS, latency, locks), admission control (rate limiting, queues, circuit breakers), and restore drills that are practiced—not theoretical.

A Note on “Magic Numbers”

You’ll often see claims like “millions of queries per second” attached to large platforms. They’re useful as a rough signal, but dangerous as an objective. The number depends on what counts as a query, the read/write mix, caching, and how measurement is instrumented. In real-world operations, the metrics that matter are different: p95/p99 latency under load, recovery time, and stability during change (migrations, failovers, reindexing).

Frequently Asked Questions

What are the clearest signs a SQL database is hitting operational limits?

Sustained increases in p95/p99 latency, CPU or IO saturation, rising replication lag, frequent locking, connection queueing, and maintenance windows that start causing noticeable performance degradation (backups, reindexing).

When does sharding make sense for MySQL or MariaDB?

When writes to a single primary become a structural bottleneck, or when hotspots can’t be solved with indexing, partitioning, caching, and tuning. It often fits multi-tenant architectures with a natural sharding key.

Does Vitess replace MySQL?

No. Vitess acts as an operational and scaling layer for MySQL (sharding, routing, large-scale operations), while MySQL remains the underlying storage engine.

Is MariaDB a drop-in replacement for MySQL 8 in production?

It depends on your feature set (SQL features, storage engines, replication behavior, and tooling). It can be viable in many scenarios, but compatibility and operational behavior should be validated in practice; Wikimedia’s production documentation is a useful reference for real-world patterns and tooling.

vía: Large-Scale SQL

Scroll to Top