Episode 12 — Fine-Tune Safely: Epochs, Learning Rates, and Catastrophic Forgetting Risks
In this episode, we shift from stopping attacks to surviving them, because a security program that cannot keep the business running under stress is going to feel like it fails at the exact moment it is needed most. Availability is the part of security that people often notice only when it disappears, like when a service is down, a network is unreachable, or critical files are suddenly unreadable. For brand-new learners, this can be confusing because it seems like availability is more of an I T problem than a security topic, but SecurityX treats it as security because outages can be caused by accidents, by malicious activity, or by a blend of both. The uncomfortable truth is that some incidents will get past prevention, and when that happens your ability to recover becomes the difference between a bad day and a business-threatening crisis. We are going to make the language around continuity and recovery feel clear, then we will dig into why backup design matters, why tests matter, and why disconnected backups can save you when everything connected gets pulled into the blast radius.
Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.
Availability means that systems and data are accessible when authorized users need them, and it is one of those definitions that sounds simple until you picture what it looks like during a real outage. Availability is not just whether a website responds, because for many organizations availability includes internal systems, identity services, core business applications, and even the ability to communicate during a disruption. A system can be technically powered on and still be unavailable if authentication fails, if a database is corrupted, or if network paths are broken. Beginners also sometimes assume that availability is a pure reliability issue, like hardware failures, but in cybersecurity it is directly tied to threats like ransomware, denial-of-service attacks, destructive insiders, and careless changes that take systems offline. What makes availability challenging is time, because minutes matter differently depending on the business function. A streaming service might lose customers with minutes of downtime, while a payroll system might tolerate hours but not days. SecurityX questions often probe whether you understand that availability is a business outcome, not a technical checkbox, and that protecting availability requires both prevention and a plan for rapid restoration when prevention is not enough.
Business Continuity and Disaster Recovery (B C / D R) is the umbrella concept that helps organizations plan for that restoration, and the simplest way to understand it is that continuity is about keeping essential functions running while something is wrong, and recovery is about returning to normal once the situation is stabilized. People mix these terms up because both deal with disruption, but the focus is different. Continuity asks what the business must keep doing even during an incident, like taking orders, supporting patients, or processing transactions, and it often involves alternative processes, temporary workarounds, and prioritization. Recovery asks how you bring systems and data back into a trustworthy state, which can involve restoring from backups, rebuilding systems, validating integrity, and carefully reintroducing services to avoid repeating the failure. Beginners sometimes hear disaster and imagine only hurricanes or fires, but in modern environments the disaster might be ransomware that encrypts file servers, a cloud outage that disrupts multiple services, or a mistaken configuration change that knocks out authentication. B C / D R is about being ready for the impact, regardless of whether the cause is natural, accidental, or malicious, because the business experiences the disruption the same way.
A strong availability strategy starts long before any backup is restored, because you need to know what matters most and what can wait, and that requires an honest look at business impact. Even if you never use a formal term for it, you are doing impact analysis when you ask which systems are critical, which data is irreplaceable, and which processes are time-sensitive. Beginners often want a single universal priority list, but priorities vary by organization, by season, and even by time of day. A hospital’s priorities may revolve around clinical systems and patient safety, while a retailer’s priorities may revolve around checkout and inventory visibility. Impact also includes dependencies, because the system you think you need might depend on another system you forget, like identity services that every application relies on for login. SecurityX scenarios often include hidden dependencies that make a recovery plan fail in practice, and the test is checking whether you can think beyond the obvious system and consider what enables it. When you can identify critical functions, map the supporting systems, and acknowledge dependencies, you can plan recovery in a way that prevents the common mistake of restoring something that cannot actually be used yet.
The next step in protecting availability is translating priorities into recovery targets, which is where many programs become vague and then suffer during real incidents. Two common recovery targets are Recovery Time Objective (R T O) and Recovery Point Objective (R P O), and you can think of them as time-to-restore and data-loss tolerance. R T O is how quickly a system must be restored to meet business needs, and R P O is how much data loss is acceptable in time terms, meaning how far back you can go and still operate. Beginners sometimes treat these as technical numbers chosen by I T, but they are business decisions because they represent tradeoffs. A shorter R T O often costs more because you need more redundancy, more automation, and more prepared recovery processes. A smaller R P O often costs more because you need more frequent backups or replication, and you need careful controls to ensure data is consistent. SecurityX expects you to understand that these targets guide design, testing, and investment, and that if you do not define them, you will discover them during the outage, usually in the form of unhappy stakeholders and rushed decisions.
Backups are at the center of availability recovery because they provide a path back to known good data, but a backup strategy is more than copying files somewhere. A backup has to be usable, which means it must be complete enough to restore what you need, recent enough to meet R P O expectations, and protected enough that it survives the incident that caused the need for restoration. Beginners often assume that having backups means you are safe, but many organizations learn the hard way that they had backups that could not be restored, backups that were too old to be useful, or backups that were encrypted right alongside the production data during ransomware. A mature backup approach also considers not only data but also the systems and configurations that make the data useful, because restoring a database without the application context can still leave you stuck. In practical terms, you want backups that cover critical data, backups that cover the ability to rebuild services, and backups that are validated through testing, because a backup you have never restored is a hope, not a plan.
Connected backups are backups that are reachable through normal network paths or normal administrative access, and they are convenient because they are easy to run and easy to manage. Convenience, however, is exactly what makes connected backups risky during certain security events. If ransomware reaches the network and gains access to backup locations, it can encrypt or delete backups as part of the attack, turning your safety net into additional victims. If an attacker compromises privileged credentials, they may be able to alter retention settings, delete recovery points, or poison backup data so restorations bring back compromised states. Even without attackers, a misconfiguration or a mistaken script can delete connected backups at scale because everything is reachable and automated. This does not mean connected backups are bad, because they often support fast recovery and frequent backup cycles, which helps R T O and R P O. It means you must treat them as part of the attack surface and protect them with strong access controls, separation of duties, logging, and careful administration. SecurityX questions will often hint at this by describing an incident where both primary data and backups were impacted, and the lesson is that connectivity can spread damage.
Disconnected backups exist to solve that exact problem, because the safest backup is one that an attacker cannot easily reach even if they compromise your environment. Disconnected can mean physically separated storage, offline media, or a backup copy that is logically isolated so it is not continuously accessible through the same credentials and network paths as production. The key idea is isolation, because isolation breaks the attacker’s ability to automate destruction. Beginners sometimes worry that disconnected backups are old-fashioned, but the reality is that isolation is a security control, and it becomes more valuable as attackers become more capable of attacking backup systems. Disconnected backups can be slower to restore from, and they can require more deliberate handling, but that slowness is the tradeoff you accept for survivability. The goal is not to replace connected backups entirely, but to have a recovery option that still exists after the worst day, when connected systems are compromised or untrusted. On SecurityX, you are often being tested on whether you understand that recovery depends on having at least one copy that remains clean and accessible when everything else is on fire.
A related idea that often appears in availability discussions is immutability, which means making backup data resistant to modification or deletion during a retention period. You do not need to know a specific product feature to understand the security logic. If backups cannot be altered easily, an attacker who gains access cannot simply erase the evidence of recovery points or quietly change them. Immutability is not magic, because attackers might still try to destroy the systems that store backups, or they might try to wait out retention periods, but it raises the cost and complexity of attacks that target recovery. For beginners, the most important concept is that you are defending your ability to restore, not just defending your production environment. That includes protecting the backup process, protecting the storage, protecting the administrative paths, and protecting the integrity of the backup data itself. SecurityX questions often reward answers that treat backup systems as high-value assets with their own security requirements. If the scenario suggests that backup administrators share credentials with system administrators, or that backup storage is broadly accessible, the right move is to strengthen separation and protection so recovery remains possible.
Testing is the part of B C / D R that separates confidence from fiction, because you can write a beautiful plan and still fail if you never practice it. Testing does not have to mean a dramatic full shutdown of production, especially for beginners thinking at a high level. Testing can include walkthroughs where teams simulate decision-making, restore drills where critical data is recovered in a controlled environment, and validation checks that confirm backups are usable and that recovery steps are accurate. The purpose of testing is to discover gaps while the stakes are low, because during a real incident the stakes are high and the time is short. Tests reveal practical issues like missing permissions, outdated contact lists, unclear responsibilities, and recovery steps that depend on systems that might be down. They also reveal human factors, like whether teams understand the sequence of decisions and whether communication channels work under stress. SecurityX commonly tests the idea that plans must be exercised and refined, because organizations that never test are often surprised by how long recovery really takes and how many dependencies were overlooked.
Recovery itself is not a single action, because restoring data is only one part of returning to a safe and stable state. A mature recovery process includes rebuilding or restoring systems, validating that data is consistent, confirming that security controls are functioning, and then bringing services back online in a controlled order. Beginners sometimes assume that once you restore from backup, you are finished, but you also need to ensure you are not restoring the problem, like restoring malware, restoring compromised configurations, or restoring corrupted data that will fail again. That is why verification and validation matter, and why recovery often includes a step where you scan or inspect restored systems before reconnecting them to production networks. Recovery also includes communication, because users and leadership need clear expectations about what is available, what is not, and what workarounds exist. On SecurityX, you may see a scenario where a team restores quickly but then gets reinfected or fails again, and the point is that recovery must be safe, not just fast. Safe recovery is a balance between urgency and control, because speed without validation can turn one incident into a recurring incident.
A key part of controlling recovery is documentation that is actually usable during stress, because the moment of an outage is not the moment to rely on tribal knowledge. Recovery documentation should identify who does what, where the necessary resources are, and what the sequence of actions should be when certain systems are down. Even if you do not use the word runbook, the concept is that you need a reliable set of instructions and decision points that teams can follow without guessing. Beginners often think documentation is mainly for compliance, but in recovery it is a survival tool because it reduces hesitation and prevents repeated mistakes. Good documentation also includes assumptions, like which credentials are needed, where backups are stored, and how to access isolated recovery copies, because those assumptions can fail during an incident if they are not actively maintained. This is where configuration management and change control connect back in, because if recovery steps depend on a system that has changed, the recovery plan can become outdated without anyone noticing. SecurityX questions often point to outdated plans, untested procedures, or unclear responsibilities, and the best answers usually emphasize updating, exercising, and maintaining recovery documentation as part of ongoing program discipline.
People and communication are the part of availability protection that many beginners overlook, because they assume recovery is mostly technical work. During a disruption, however, decisions must be made quickly, priorities must be set, and information must be shared without causing panic or confusion. That means roles and communication channels need to be established ahead of time, including who coordinates recovery, who approves major actions, who communicates with leadership, and who communicates with affected users or customers. Even in a small organization, someone must be accountable for steering the response rather than having everyone work separately. Communication also protects security, because during outages attackers sometimes exploit confusion by sending fake messages or social engineering requests, hoping stressed teams will bypass controls. A disciplined communication plan reduces that risk by defining trusted channels and verification habits. On the exam, scenarios may describe chaotic response efforts, conflicting messages, or delays because nobody knows who is in charge, and the best responses often involve clarifying roles, establishing communication procedures, and practicing them in tests so they work when needed.
Another availability lesson that matters for SecurityX is that recovery planning must include the dependencies that make recovery possible, not just the systems you intend to restore. If your identity system is down, you might not be able to log into recovery tools. If your network management is down, you might not be able to reconfigure routes needed for failover. If your monitoring is down, you might bring services back without knowing whether they are stable or compromised. A beginner-friendly way to think about this is that recovery has its own supply chain inside your organization, and you must protect that supply chain as carefully as you protect production. That often includes protecting administrator access, protecting key management systems, and ensuring you have out-of-band ways to coordinate if primary communication tools fail. SecurityX questions sometimes describe a situation where recovery is slowed because essential supporting systems were not included in the plan, and the correct approach is to expand planning to cover these dependencies and to test under realistic constraints. When you plan for dependencies, you reduce the chance of being locked out of your own recovery process.
As we wrap up, protecting availability is not just about hoping systems never fail, but about building a recovery posture that survives the failures you cannot prevent. B C / D R provides the program-level framework for keeping critical functions running and restoring normal operations when disruption occurs, and recovery targets like R T O and R P O turn vague hopes into clear priorities and design requirements. Connected backups can enable fast recovery but can also be attacked because they live inside the normal connectivity path, while disconnected backups provide isolation that can preserve a clean recovery option when everything else is compromised. Testing turns plans into reality by revealing missing steps, broken assumptions, and human coordination challenges before the real crisis arrives. Safe recovery emphasizes validation and controlled restoration, because restoring quickly is not enough if you restore the compromise or bring services back in a fragile state. When you understand availability as a security outcome that depends on planning, isolation, evidence, and practice, you will be able to reason through SecurityX scenarios with confidence and choose answers that keep organizations functional under pressure.