Back to all articles
Cybersecurity Featured

Backups Are Boring, Until They’re Strategy

Pavels Gurskis
Pavels Gurskis
October 13, 2025 9 min read
Backups Are Boring, Until They’re Strategy

In the previous post we mapped crown-jewel data, retention classes, and ownership. This post turns that map into recovery leverage. We focus on backup maturity so leaders can speak credibly during an incident and act quickly.

The day your backups become PR strategy

Key term: Backup maturity: having offline or immutable copies and a rehearsed path to restore critical services.

Picture the worst hour of your year: customers are locked out, apps are dark, and reporters are pinging your comms lead. In that moment, backups are not IT plumbing - they are the only way to say: “We are restoring now.” If you cannot say that with confidence, your disclosure clock, brand trust, and revenue runway all compress at once.

Here is the uncomfortable truth most boards miss: ransomware today is as much a public narrative as it is a technical outage. Attackers often encrypt and steal your data, then select the most damaging bits, and weaponize the story. Legal and comms can shape the message, but only operations can create a better ending. That ending starts with backup maturity, more than backup spend.

Think of backup maturity as three business capabilities:

  1. Prove a clean, unchangeable copy exists outside attacker reach.
  2. Start a restore in isolation so you don’t re-infect prod.
  3. Practice the sequence with the right people, so the first real restore isn’t a cold open.

Each capability maps to an executive promise you can make externally: we have a safe copy, we can rebuild without spreading the damage, and we know how long it takes.

Why this matters now: recovery that looks fine in status reports often collapses under pressure. One 2025 survey found that data recovery through backups is at its lowest rate in six years. This tells us that while copies do exist a lot of teams lack reliable restoration. This is rarely a technology failure - most often it is a governance gap. Credentials are shared. “Immutable” copies are reachable from the same admin plane. Teams practice file restores more than service restores. When extortion lands, leaders discover they bought storage instead of resilience.

For CFOs and CEOs, the framing is simple: backups are an option value more than a cost center. The option you are buying is to decline a ransom and still hit an acceptable recovery timeline. That option has a price and a payoff. The price is enforcing isolation and rehearsing restores when it feels inconvenient. The payoff is leverage when it counts, including with insurers and regulators who increasingly ask for proof rather than posture.

What to listen for in status meetings

You want clear ownership of crown-jewel restores “by service”, over “by technology silo”. Evidence of isolation - different credentials, network separation, and a “do not auto-mount on restore” rule. A drumbeat of drills that include Legal and Comms, because the first hour is as much about permission and messaging as it is about runbooks. If those elements are missing, your risk is narrative drift: you will be explaining, not restoring.

Quick move

Ask for one slide this week that answers a single question: “If production went down at noon, which three services could we start restoring safely by 3 p.m., and who owns each call?” Keep it honest and concrete. The discussion it sparks is the beginning of real backup maturity.

Design the safety net: online, offline, and immutable

Key term: Immutable backup: a copy that cannot be altered or deleted for a set period, enforced by the storage system.

A resilient backup design gives you two superpowers under stress: fast rollbacks for everyday mistakes and a vault for bad days. Think of it as a safety net that lets operations move quickly without betting the company. When an incident hits, the structure of that net determines how much leverage you have with attackers, insurers, and regulators.

The three layers executives should see

  • Primary snapshot tier - High-speed copies close to production for quick, local recovery from routine errors and small outages.
  • Immutable or offline vault - A separate store with retention locks and tightly scoped credentials. Access is rare, audited, and follows a break-glass procedure.
  • Off-region copy - A geographically separate replica to survive regional cloud issues and facility-level risks while meeting legal hold and continuity needs.

Guardrails that make or break the design

  • Credential separation - Dedicated backup admin identities, MFA on every privileged action, and no reuse of production admin accounts.
  • Network isolation - One-way replication into the vault, isolated networks or accounts, and logging that proves who touched what and when.
  • Retention you can stand behind - Object lock or WORM policies sized for typical ransomware dwell time windows, plus alerts when someone attempts early deletion.
  • Safe restore mechanics - Hydrate data inside an isolated recovery environment, scan artifacts before cutover, and block auto-mount behaviors that speed reinfection.
  • Key management discipline - Independent key custody for encrypted backups and a documented procedure for emergency access.

Cloud and SaaS coverage without assumptions

Key terms:

  • RTO: Recovery Time Objective. Defines how quickly you can restore your data.
  • RPO: Recovery Point Objective. Defines how much data (in hours) you might loose without noticable impact.

Provider snapshots handle availability inside their platforms, while business continuity depends on copies you control. For Microsoft 365, Google Workspace, CRM, and other SaaS data, define a target RPO/RTO per service and ensure coverage with cross-tenant or third-party backups that you have actually restored. Treat these restores like production work: name the service owner, record the last test date, and keep the evidence.

One proof point that matters

CISA’s StopRansomware Guide calls out a simple principle that aligns every design choice above: maintain offline, encrypted backups of critical data and regularly test their availability and integrity. Use this as the policy north star for board reviews and vendor conversations.

Design decisions to lock now

  • Approve the minimum blueprint: snapshots for speed, an immutable or offline vault for safety, and an off-region copy for resilience.
  • Require evidence: last successful restore report for one crown-jewel system, plus the retention and access policy that protects its vault copy.

Practice the rescue

Key term: IRE: Isolated recovery environment. A separate, locked-down network used to restore and validate systems before returning them to production.

Restores succeed when teams rehearse under a clock. Backups provide raw material. Drills turn that material into a reliable sequence people can execute under pressure. Treat the rehearsal like a flight simulator: same roles, realistic injects, and an honest timer.

From tabletop to muscle memory

Start with a focused scenario on a single crown-jewel service. Set one outcome: reach a go/no-go decision to initiate a restore. Keep the cast tight and accountable.

  • People: service owner, platform lead, security lead, legal, communications, incident commander.
  • Agenda: incident brief, scope decision, restore point selection, IRE plan, communications draft, exit criteria.
  • Clocks to watch: time to decision, time to first validated restore task, time to business acceptance.
  • Evidence to capture: who made each call, which artifacts proved integrity, which steps created delay.

Clean-room restore - the sequence that protects you

An IRE lowers reinfection risk and gives space to validate data and services.

  1. Intake: pull known-good backup sets into the IRE using one-way transfer.
  2. Core services first: bring up identity, DNS, and logging inside the IRE.
  3. Scan and stage: inspect images and data with current signatures and tooling.
  4. Functional smoke tests: validate critical paths for the service - auth, data reads, writes, integrations.
  5. Decision gate: compare IRE performance and risk findings to your cutover criteria.
  6. Return to production: promote through a controlled path with monitoring on and rollback ready.

Pro tip: Maintain IRE up-to-date and availabe, or have it as VM snapshots with the scripts to replicate lates seed data from production.

Friction points to expect

  • Identity and keys: missing secrets, disabled service principals, or cryptography procedures that slow access.
  • Ownership gaps: technology towers know components while service owners understand the business path. Bridge both.
  • SaaS data: shared responsibility requires verified restores for mail, files, and CRM datasets you rely on.
  • Unsafe shortcuts: auto-mount behaviors, shared admin credentials, and copy-paste runbooks. Replace with documented steps and approvals.
  • Logging: without clean logs in the IRE, you cannot prove integrity. Enable logging early and retain outputs.

Field note

During the 2017 NotPetya event, Maersk recovered after locating a single domain controller in its Ghana office that happened to be offline. That offline copy became the anchor for rebuilding identity and services - a vivid example of how an isolated, clean starting point changes the recovery story. (WIRED)

Quick addition

For your next drill, add two visible timers: one for decision to restore and one for restore start in the IRE. Capture both in the after-action note. Those two numbers become your simplest, most actionable recovery KPIs.

Make it measurable: KPIs the board will back

Boards fund what they can see. A compact recovery scorecard turns backup spending into visible progress that leaders can review in minutes. The goal is a small set of metrics that reveals whether your environment can restore safely at the speed your business needs when ransomware pressure arrives.

Five metrics that tell the truth

  • Restore success rate (last 90 days) - The share of full restores for tier-1 services that met business acceptance criteria. Use service names, not platforms, and show trend arrows.
  • Median time to IRE - The median time from incident declaration to the first validated instance of a service in the isolated recovery environment. This is the pace-setting clock for negotiations and public updates.
  • Immutability coverage - The percentage of tier-1 data under enforced retention lock or offline protection with separate credentials. Include the oldest protected copy date for quick risk scanning.
  • Age of last full-restore test - The number of days since each crown-jewel service completed a clean-room restore with sign-off. Display the oldest item prominently to focus attention.
  • SaaS backup coverage - The percentage of critical SaaS apps with tested, restorable copies. Name the applications and list the last test date so gaps are obvious.

How to read the dashboard

Set each metric against a business threshold, then color by outcome. For example, if the customer portal has a 12-hour RTO, show whether recent drills reached a validated service in the IRE within that window. Keep definitions tight: what counts as a “full restore,” what “validated” means, and who signs off. Require notes for any yellow or red gauge that answer three questions in plain language: what slowed us down, who owns the fix, and when the next test will prove it.

What good evidence looks like

Executives should see names, timestamps, and artifacts. A screenshot of an object-lock policy proves immutability. A drill workbook with decision time and restore start time proves operational cadence. A short memo from Legal and Comms confirms that messaging and permissions are aligned with the restore path. Each artifact connects a KPI to verifiable work, which builds trust during audits, renewals, and board reviews.

One anchor stat for stakes

Breach costs remain material for planning: IBM’s 2025 study reports a $4.44 million global average cost per breach. Use that single number to frame how faster identification, containment, and restoration change your exposure over the first day of an incident.

Short step

Ask for a one-page recovery scorecard for your top five services before the next ops review. Include the five metrics above, owner names, and links to the latest drill evidence. Keep the page under five minutes to read so it shows up in every executive packet without debate.

Bringing it together: backups as leverage

When pressure arrives, your leverage comes from a simple sequence you can execute at speed: prove a safe copy exists, restore it in isolation, and show progress your leaders can narrate in public. The hardware and software matter, but the advantage comes from rehearsal, separation of control, and evidence you can hold up in a boardroom. Treat backups as option value - the right to choose a recovery path that protects customers and brand.

Previous Article From Assessment to Action: Your 12-Month Cloud Cost Roadmap
Next Article No next article