When “the cloud” is just a single data center with a fancy name: sysadmin takeaways from Korea’s G-Drive outage (and how to truly harden your data)

Published 10/11/2025

X (Twitter) Facebook Pinterest LinkedIn Email WhatsApp

A fire at NIRS (Daejeon, Korea) destroyed the government’s G-Drive—the “cloud” where ~750,000 civil servants have stored their work since 2018—and crippled 96 additional critical systems. Because of the platform’s large-capacity/low-performance storage design, no off-site backups existed, so user files are gone (except whatever can be reconstructed from other systems like OnNara). Translated for admins: the “cloud” was single-site, had no 3-2-1, and critical services were co-located in the same failure domain.

In Spain, the ENS (National Security Scheme, RD 311/2022) at High level requires a DRP and appropriate backups; but policy ≠ execution. Treat this incident as a prompt to review architecture, backups, and procedures.

The design failure (and how to avoid it)

What went wrong

Physical monolith: a “cloud” in one data center.
No off-site: the storage architecture couldn’t replicate externally.
Coupling: 96 other critical systems impacted by the same physical event.
Exclusive use: some ministries mandated G-Drive as the only source of truth.

Minimum antidote for any “enterprise” platform

3-2-1 (better: 3-2-1-1-0)
- 3 copies, 2 different media, 1 off-site; add 1 immutable/air-gapped copy (S3 Object Lock, WORM, tape) and 0 errors verified by test restores.
Geographic redundancy
- Active-active across two zones/regions (RTO≈0 if capacity allows).
- Active-passive with orchestrated failover and a defined RTO.
Separate failure domains
- Power, cooling, network, racks, uplinks, and (when feasible) different providers.
Backups out of band
- Repositories protected by segregated identities, MFA, and least privilege, not dependent on the same IAM/AD as production.
Immutability
- WORM (S3 Object Lock compliance mode), true air-gap with tape, or “sealed” repositories.
Realistic RPO/RTO
- Define per service. Document dependencies (DNS, IAM, PKI, queues, feature flags).
DR drills
- Timed failover and restore exercises at least 1–2/year, and after major changes.

Reference architectures (fast wins with real impact)

1) Private/colo “cloud”: active-active + immutable off-site backup

[DC A]  <—sync/async replication—>  [DC B]
      \                               /
       \—(backup jobs→WORM repo)—>  [DC C]
Code language: HTML, XML (xml)

Production: replicate metadata and data (block/object) between A and B.
Backups: immutable daily/hourly copies to C (different region/provider).
Runbook: fail over to B (A down) or restore from C (catastrophe).

2) Hybrid/SaaS: shared responsibility

SaaS ≠ your backup. Contract for:
- Locations (zonal/regional) and provider RPO/RTO.
- Periodic export into your WORM.
- DR evidence or right to test recovery.

Tools and patterns that work (and you can deploy now)

Backups
- Proxmox Backup Server: dedupe, encryption, restore verification, retention policies; immutable if the store backend supports it.
- Veeam: SOBR + Object Lock, anomaly detection, SureBackup (verified restores), DR Orchestrator.
- LTO tape: cost-effective air-gap for long retention and compliance.
Data
- S3-compatible object with versioning + Object Lock (MinIO/ceph-rgw, public cloud).
- ZFS snapshots + send/receive between DCs, with retention and scheduled scrub.
DR orchestration
- IaC (Terraform/Ansible) + automated runbooks.
- DNS/Anycast/GLB for traffic shifting; config-as-code (Consul/etcd).
Security
- Separate-domain IAM/LAPS for backup repos.
- MFA everywhere; secrets vault (HashiCorp Vault; HSM where needed).
- SIEM/SOAR alerts for mass deletions, policy changes, or encryption on backup repos.

Mini-runbook template — total loss of DC A

Goal: recover service from B with RPO ≤ X min and RTO ≤ Y min.

DR short path (A impaired)
- Freeze changes on A (if partially alive).
- Promote B to primary (DB/object/queues).
- Switch DNS/GLB to B.
- Validate health (APM, synthetics, healthchecks).
- Stakeholder comms.
DR long path (A & B lost)
- Provision B from templates (IaC).
- Restore data from C (WORM):
  - DB → point-in-time
  - Objects/files → last good version
- Validate integrity (checksums), start services, switch DNS.
Post-mortem
- TTD/MTTD, actual RPO/RTO, blockers, root causes, actions.

Sysadmin checklist (so you don’t repeat Daejeon)

Two zones/DCs in active-active or tested failover (runbooks).
Off-site immutable backup (WORM/air-gap) in a third failure domain.
RPO/RTO per service and documented drills.
Out-of-band backups (segregated IAM, MFA, least privilege).
Anomaly monitoring on backup repos + alerts.
Data inventory (SaaS included) and defined export/retention.
ENS/ISO 27001/22301 compliance and audit evidence.
Semi-annual DR drills with timings and outcomes.

Expert comment — David Carrero (Stackscale – Grupo Aire)

“The best insurance isn’t one; it’s several. At Stackscale we run production active-active across two DCs and also keep immutable backups in a third site. For immutable/air-gap we use tools like Proxmox Backup Server or Veeam with Object Lock. The brand matters less than the design—and testing restores: no tests, no DRP.”

Operational translation: production that survives a site loss, backups that can’t be altered, and timed failover/restore drills.

3-2-1-1-0 policy snippet (YAML)

policy:
  copies:
    total: 3
    media:
      - disk (primary)
      - object-storage (WORM)
    offsite:
      enabled: true
      location: dc-c
    immutability:
      mode: compliance
      retention: 30d
  rpo: "15m"    # per service
  rto: "60m"
  verification:
    schedule: weekly
    method: restore-test + checksum
    target: isolated network
  access:
    iam: separate-domain
    mfa: required
    roles:
      - backup-admin (no prod)
      - restore-operator (break-glass)
Code language: PHP (php)

SaaS: the uncomfortable reminder

It’s still your responsibility to know where your data lives and how to recover it.
Contractually require RPO/RTO, periodic export, DR evidence, and the right to exercise recovery.
Keep your own copies (export/backup) in a WORM repository.

FAQ

How often should I test DR?
At least 1–2 times/year and after major changes—always with timings (actual RTO) and a report.

Active-active or active-passive?
Depends on RTO and budget. Active-active minimizes downtime but adds consistency complexity; passive lowers cost but increases RTO.

What should I use for immutability?
S3 Object Lock (compliance), LTO tape for air-gap, ZFS snapshots with retention, and PBS/Veeam repositories configured in WORM.

How do I detect someone “killing” my backups?
Alerts on policy changes, mass deletions, and encryption patterns; strict IAM, universal MFA, and identity domain separation.

Bottom line

If your “cloud” fits inside one building, it’s not a cloud—it’s a single point of failure with lots of metal. Korea’s incident says it bluntly. To avoid a repeat: geo-redundancy, off-site immutable copies, segregated access, and regular DR drills. Everything else is just semantics.

sources: Noticias cloud y korea joongang daily

X (Twitter) Facebook Pinterest LinkedIn Email WhatsApp

When “the cloud” is just a single data center with a fancy name: sysadmin takeaways from Korea’s G-Drive outage (and how to truly harden your data)

The design failure (and how to avoid it)

What went wrong

Minimum antidote for any “enterprise” platform

Reference architectures (fast wins with real impact)

1) Private/colo “cloud”: active-active + immutable off-site backup

2) Hybrid/SaaS: shared responsibility

Tools and patterns that work (and you can deploy now)

Mini-runbook template — total loss of DC A

Sysadmin checklist (so you don’t repeat Daejeon)

Expert comment — David Carrero (Stackscale – Grupo Aire)

3-2-1-1-0 policy snippet (YAML)

SaaS: the uncomfortable reminder

FAQ

Bottom line

Related articles

Top Backup Solutions Compatible with Proxmox VE: Features and Comparison

Postfix vs Exim vs Sendmail: Which One to Choose in 2025?

ProxLB 1.1.0 Narrows the Gap Between Proxmox VE and VMware with Advanced Load Balancing

Scrapy: The Most Powerful Web Scraping Tool… and the Most Feared by Millions of Websites

The revolution of Just-In-Time (JIT) Compilation in software development

Linux achieves 90% market share on the world’s most powerful supercomputers

Getting to know and comparing Redis vs Sphinx

The Real CTO: From Code Warrior to Strategic Tech Leader

Colanode: The Open-Source Collaboration Platform That Gives You Full Control Over Your Data

Linux Mint 22.1 “Xia”: Everything You Need to Know About the New Release