Business Continuity and Disaster Recovery Planning

Contingency Planning

  • Information systems contingency planning refers to a coordinated strategy involving plans, procedures, and technical measures that enable the recovery of information systems, operations, and data after a disruption
  • Resilience is the state of an organization where it quickly adapts and recovers from any known or unknown changes to the environment
  • BCP and DRP are types of contingency planning
  • BCP and DRP help minimize financial impact during serious incidents by protecting tangible and intangible assets

Source: NIST Special Publication 800-34 Rev. 1: Contingency Planning Guide for Federal Information Systems

BCP / DRP

  • Business Continuity Planning (BCP)
  • Preservation of business in the face of disruptions
  • Focuses on sustaining an organization’s mission/business processes during and after a disruption
  • BCP may be created for a single business unit or for the entire organization’s processes; may also be scoped for only functions deemed to be priorities
  • BCP is the responsibility of the security team since it provides availability
  • Disaster Recovery Planning (DRP)
  • DRP is concerned with restoring operability of disrupted IT systems, whereas BCP is concerned with keeping business processes available
  • DRP applies to major (usually physical) disruptions to service that deny access to the primary facility infrastructure for an extended period
  • DRP only addresses information system disruptions that require relocation to infrastructure at an alternate site

Source: NIST Special Publication 800-34 Rev. 1: Contingency Planning Guide for Federal Information Systems

The Need for BCP

  • Natural disasters
  • Social unrest or terrorist attacks
  • BCP may often be triggered by an audit
  • Legislative/regulatory requirements
  • Equipment failure (such as disk crash)
  • Disruption of power supply or telecommunication
  • Application failure or corruption of database
  • Human error, sabotage or strike
  • Malicious Software (Viruses, Worms, Trojan horses) attacks
  • Hacking or other Internet attacks
  • Fire

Source: Introduction to Business Continuity Planning; SANS Institute InfoSec Reading Room

Standards

  • NFPA 1600
  • National Standard on Preparedness by the national Fire Protection Association
  • ISO 17799
  • Defense Security Services (DSS)
  • A division of the DoD
  • NIST
  • Standard of due care / best practice/good business practice

Enterprise wide Continuity Planning

Enterprise wide Continuity Planning

Critical Success Factors for BCP Implementation

• Management support

  • Ensures the management will allocate resources for this project.
  • It is the key driver of organizational change
  • Management awareness will steer the program and set priorities

• Accountability and responsibility

  • All departments/individuals know their role in incorporating BCM
  • A BCM team lead should oversee the overall process development and report to management on obstacles faced

• Integral part of information assurance management program

  • BCM is not separate from the organization’s overall IT management
  • Needs and allows continuous monitoring and improvement
  • BCM should be integrated into the total change management process

Source: Information Assurance Handbook: Effective Computer Security and Risk Management Strategies by Corey Schou and Steven Hernandez

BCP Process

BCP Process

Source: CISSP CBK

A. Project Initiation

  • BCP and DRP plan must be based on a clearly defined policy, which states:
  • Organization’s overall contingency objectives
  • Organizational framework
  • Resource requirements
  • Roles and responsibilities
  • Scope as applies to common platform types and organization functions
  • Training requirements
  • Exercise and testing schedules
  • Plan maintenance schedule
  • Minimum frequency of backups and storage of backup media

Source: NIST Special Publication 800-34 Rev. 1: Contingency Planning Guide for Federal Information Systems

Project Initiation

  • Project scope development and planning
  • BCP vs. DRP
  • Crisis management planning
  • Continuous availability
  • Incident Command System (ICS)
  • Executive Management Support - CIO must support the contingency program and be included in the process to develop the program policy
  • Project scope and authorization
  • Continuity Planning Project Team formation

B. Current State Assessment

  • Understand Enterprise Strategy, Goals and Objectives
  • Business Impact Analysis
  • Threat analysis
  • Identify critical business functions
  • 3rd party relationships
  • Assessment of current continuity planning process
  • Benchmarking or peer review

Business Impact Analysis (BIA)

  • BIA correlates system with critical mission/business processes and services provided to characterize the consequences of a disruption
  • Three steps are typically involved in accomplishing the BIA:
  • Determine mission/business processes and recovery criticality
  • Identify resource requirements Realistic recovery efforts of the resources required to resume mission/business processes as quickly as possible
  • Identify recovery priorities for system resources: Linkage between system resources critical to mission/business processes and functions can be identified. Priority levels can be established for sequencing recovery activities and resources.

Critical Business Functions

  • Impacts on business functions are analyzed in terms of availability, integrity and confidentiality
  • Availability (Time Sensitivity)
  • Recovery Time Objective (RTO) - the maximum amount of time that a system resource can remain unavailable before there is an unacceptable impact on other system resources, supported mission/business processes, and the MTD
  • Plan of Action and Milestone for mitigation should be initiated if RTO is not feasible
  • Maximum Tolerable Downtime (MTD) - the total amount of time the system owner is willing to accept for a mission/business process outage or disruption and includes all impact considerations
  • Max Allowable Downtime (MAD) – the total amount of time that the system can be unavailable before significant organizational impact will result.
  • Data Integrity
  • Recovery Point Objective (RPO) - the point in time, prior to a disruption, to which data can be recovered after an outage
  • Critical business functions should be classified based on the determined impact

Sample Business Impact Analysis (BIA)

Sample Business Impact Analysis

Cost Balancing

Cost Balancing
  • The longer a disruption is allowed to continue, the more costly it can become to the organization
  • Conversely, the shorter the RTO, the more expensive the recovery solutions cost to implement
  • Plotting the cost balance points will show an optimal point between disruption and recovery costs

Critical Business Functions

  • Identification of critical business functions
  • Operational impact
  • Financial impact
  • Reputation or public image impact
  • Dependencies
  • BIA enables characterization of the system components, supported business processes, and interdependencies
  • Possible business impacts due to the unavailability of systems can be determined (RTO,MTD, etc.)
  • Then sequencing recovery of information system components can be finalized which will form the basis for developing contingency solutions

Source: NIST Special Publication 800-34 Rev. 1: Contingency Planning Guide for Federal Information Systems

Third Party Relationships

  • Downstream liabilities
  • Who will be impacted if your business is interrupted?
  • Upstream impacts
  • What happens if a partner’s business is interrupted?
  • Enforce SLAs

Identify Preventive Controls

  • Some outage impacts identified in BIA may be mitigated or eliminated through preventive measures that deter, detect, and/or reduce impacts to the system
  • Where feasible and cost-effective, preventive methods are preferable to recovery methods. For example:
  • Appropriately sized uninterruptible power supplies (UPS)
  • Gasoline- or diesel-powered generators to provide long-term backup power;
  • Air-conditioning systems with adequate excess capacity to prevent failure of certain components, such as a compressor;
  • Fire detection and suppression systems;
  • Heat-resistant and waterproof containers for backup media and vital non electronic records;
  • Offsite storage of backup media, non electronic records, and system documentation
  • Frequent scheduled backups including where the backups are stored (onsite or offsite) and how often they are recirculated and moved to storage.

C. Development Phase

  • Develop and design recovery strategies
  • IT recovery
  • Business process recovery
  • Facilities recovery

BCP/DRP Development

BCP/DRP Development

Activation and Notification Phase

  • Defines initial actions taken once a system disruption or outage has been detected or appears to be imminent
  • Activation Criteria and Procedure - BC or DR plan should be activated if one or more of the activation criteria are met. Criteria may be based on:
  • Extent of any damage to the system
  • Criticality of the system to the organization’s mission
  • Expected duration of the outage lasting longer than the RTO
  • Notification Procedures - Describe the methods used to notify recovery personnel during business and non business hours. Notification methods can be:
  • Manual
  • Automatic
  • Outage Assessment - Assess the nature and extent of the disruption
  • Assessment should be completed as quickly as the given conditions permit
  • Outage Assessment Team is the first team notified of the disruption.

Source: NIST Special Publication 800-34 Rev. 1: Contingency Planning Guide for Federal Information Systems

Recovery Phase

  • Focuses on implementing recovery strategies to restore system capabilities, repair damage, and resume operational capabilities at the original or new alternate location
  • Sequence of Recovery Activities
  • Should reflect the system’s MTD to avoid significant impacts to related systems
  • Recovery Procedures
  • Should provide detailed procedures to restore the information system or components to a known state
  • Recovery procedures should be written in a straightforward, step-by-step style
  • Recovery Escalation and Notification
  • Effective escalation and notification procedures should define and describe the events, thresholds, or other types of triggers that are necessary for additional action
  • At the completion of the Recovery Phase, the information system will be functional and capable of performing the functions identified in the plan

Reconstitution Phase

  • Defines the actions taken to test and validate system capability and functionality
  • Concurrent Processing - running two systems concurrently until a level of assurance that recovered system is operating properly
  • Validation Data Testing - testing and validating recovered data to ensure complete and current recovery
  • Validation Functionality Testing - verifying that all system functionality has been tested, and that normal operations can resume
  • Deactivation of plans to return to normal operations are:
  • Notifications – notifying users using predefined procedures that normal operations have resumed
  • Cleanup - cleaning up work space or dismantling any temporary recovery locations, restocking supplies, returning manuals or other documentation to their original locations, and readying the system for another contingency event
  • Offsite Data Storage - If offsite data storage is used, retrieved backup should be returned to its offsite data storage location
  • Data Backup - system should be fully backed up and a new copy of the current operational system stored for future recovery efforts
  • Event Documentation - All recovery and reconstitution events should be well documented for an after-action report with lessons learned

Source: NIST Special Publication 800-34 Rev. 1: Contingency Planning Guide for Federal Information Systems

Backup and Recovery

  • Backup and recovery methods and strategies are a means to restore system operations quickly and effectively following a service disruption
  • These should be integrated into the system architecture during the Development/Acquisition phase of the SDLC
  • Considerations for developing or comparing backup and recovery methods:
  • Cost
  • Maximum downtimes
  • Security
  • Recovery priorities
  • Integration with organization-level BCM plans

Recovery Time

Recovery Tier

Recovery Timeframe

Recovery Requirement

I

0-24 hours

Resources must be available in advance and implemented first

II

1-3 days

Resources must be available in advance

III

3-5 days

Resources must be identified and quickly available

IV

Other

Resources must be identified

Method to Prioritize Business Processes or IT Infrastructure Components

High Availability (HA) Processes

  • HA is a process where redundancy and failover processes are built into a system to maximize its uptime and availability
  • Goal of HA is to achieve an uptime of 999% or higher
  • HA can be expensive, and is not a viable option for many systems and should be considered only for systems that cannot tolerate downtime
  • HA systems cannot be a replacement for a solid backup strategy
  • HA processes need to be extended to an alternate location
  • Mechanisms such as block mirroring to an alternate site should be considered to provide redundancy and backup of system data outside of the system facility.

IT Recovery Strategies

  • Multiple Processing Sites
  • Mirrored Sites
  • Fully redundant with identical data and equipment as well as communication capabilities
  • Highest level of availability at highest cost
  • It ensures virtually 100% availability
  • Configuration management is a challenge
  • RTO of minutes to hours

IT Recovery Strategies

  • Mobile site/trailer
  • Self contained unit with IT and communications
  • RTO of 3-5 days
  • Hot site
  • Fully equipped data center and communications
  • RTO of few minutes to hours
  • Warm site
  • Has some level of IT capabilities, but will have to be further equipped to take over IT operations
  • RTO of 5+ days
  • Cold site
  • A location capable of supporting IT operations, but with no equipment RTO of 1-2 weeks at the minimum

Alternate IT Recovery Strategies

  • Virtual business partners
  • Similar to multiple sites, but alternate sites are hosted by business partners
  • Reciprocal or mutual aid agreements with an internal or external entity
  • Dedicated site owned or operated by the organization
  • Commercially leased facility

Backup Approaches

  • Electronic vaulting
  • Sending data directly to an alternate facility
  • Can be stored on disk or tape depending on RTO requirements
  • Remote journaling
  • Replicated data transactions in real-time or near real-time at a secondary processing site
  • Offsite storage
  • Storage Area Network
  • Database shadowing and mirroring

Backup Methods

  • Data integrity involves keeping data safe and accurate on the system’s primary storage devices
  • There are three common methods for performing system backups:
  • Full Backup - captures all files on the disk or within the folder selected for backup
  • Locating a particular file or group of files is simple
  • Time required to perform a full backup can be lengthy; might also lead to excessive, unnecessary media storage requirements
  • Differential Backup - stores files that were created or modified since the last full backup
  • Takes less time to complete than a full backup
  • Incremental Backup - captures files that were created or changed since the last backup
  • Afford more efficient use of storage media; backup times are reduced
  • Media from different backup operations may be required to recover a system from an incremental backup

Backup Locations

  • On-site
  • Near-site
  • Off-site

Communications

  • Emergency communication systems
  • Remote access may serve as an important contingency capability by providing access to organization-wide data for recovery teams or users from another location
  • Wireless (or WiFi) local area networks can serve as an effective contingency solution to restore network services following a wired LAN disruption
  • Business communications systems
  • Networks
  • Some of the ways to ensure communication availability are:
  • Redundant communications links
  • Redundant network service providers
  • Redundant network-connecting devices
  • Redundancy from NSP or Internet Service Provider (ISP)
  • Monitoring software can be installed to provide warning and troubleshoot network issues before users and other nodes notice problems.

D. Implementation

  • Initial walkthroughs of design
  • Implement design
  • Test
  • Monitor

Testing, Training and Exercises (TT&E)

  • Training - personnel are trained to fulfill their roles and responsibilities within the plan
  • Exercises – plans simulated to validate their content
  • Testing - systems and system components tested to ensure their operability in a disrupted environment

Testing

  • Design short and long term continuity and crisis management testing plans
  • Update plans as necessary and document
  • Test types
  • Checklist
  • Walkthrough (table top review)
  • Simulation
  • Parallel
  • Full-interruption

BCP Program Awareness and Training

  • Recovery strategy and procedures must be documented and made available
  • Recovery personnel should be familiar with their roles and necessary teaching skills to be prepared for tests, exercises and actual outage events
  • Training should be provided at least annually, and to the extent that respective recovery roles are executed without aid of documentation
  • Leadership training – crisis management
  • Tech teams training – procedures and logistics
  • Part of onboarding training
  • Recovery personnel should be trained on the following plan elements:
  • Purpose of the plan
  • Cross-team coordination and communication
  • Reporting procedures
  • Security requirements
  • Team-specific processes
  • Individual responsibilities

BCP Program Exercises

  • An exercise is a simulation of an emergency designed to validate the viability of one or more aspects of the Business Continuity or Disaster Recovery plans
  • Exercises are scenario-driven
  • Types of exercises are:
  • Tabletop Exercises - Discussion-based exercises roles during an emergency and responses to a particular emergency situation are discussed
  • Functional Exercises - Personnel validate their operational readiness for emergencies by performing their duties in a simulated operational environment

Developing BCP/DRP culture

  • Personnel across the organization must be confident and competent with the BCP/DRP program
  • BCP must be aligned with organizational business objectives
  • Organizations must establish a BCM culture and integrate it into daily business operations with the support of the CRO and senior management.
  • Three techniques are involved in developing and establishing BCM culture within an organization:
  • Design and deliver an awareness campaign to create and promote BCM awareness and develop skills, knowledge, and commitment required to ensure a successful BCM practice.
  • Ensure the awareness campaign has achieved its goals and monitor BCM awareness for a longer term.
  • Perform an assessment on the current BCM awareness level against the management-targeted level.

Emergency Operations Center

  • A physical location to coordinate emergency response efforts
  • Virtual EOC
  • Helps in the case of a pandemic or globally dispersed key employees

E. Management of BCP/DRP

  • Program oversight
  • Continuity planning manager
  • Updating and maintenance on the plan - Changes in specific areas may require attention, for example, employee turnover, changes to organizational structure, changes to business processes, etc.
  • Regular practice of the plan
  • Validate plans by performing simulations of different scenarios by everyone involved
  • Frequency of exercises depends on the rate of changes made within the organization
  • Review the result of earlier exercises to ensure identified weaknesses have been addressed
  • Review BCP - An audit by internal or external auditors can highlight all key material weaknesses and issues
Improve Your Grades with Custom Writing Help
Homework Help
Writing Help
Editing Services
Plagiarism check
Proofreading services
Research Project help
Custom writing services
scanner
E learning blogs

Disclaimer : The study tools and academic assistance/guidance through online tutoring sessions provided by AssignmentHelp.Net is to help and enable students to compete academically. The website does not provide ghostwriting services and has ZERO TOLERANCE towards misuse of the services. In case any user is found misusing our services, the user's account will be immediately terminated.