Progressive Disaster Recovery Testing

As a follow-on from my previous post Disaster Recovery is about Business Process Recovery not IT Infrastructure Recovery I wanted to talk about the sort of DR testing required.

Many companies are very aware of the need for disaster recovery testing and engage with suppliers to perform annual testing but testing a set of applications without sufficient preparation can lead to an unsatisfactory annual test. Many legacy applications have not been designed to be recovered in a disaster and require further adaptation before performing annual disaster recovery testing.

I found much is written on different types of testing but not how the testing integrates together and not how testing applies to projects delivering new applications. To help guide a disaster recovery testing strategy I created a template testing roadmap as shown below.

DR Testing Across the top of the roadmap is listed different forms of testing identified in different articles (although I have seen different definitions for each of the stages):

  • Checklist testing is a review of all the individual recovery checklists to ensure
    • the checklists are current
    • people are aware of their responsibilities
    • changes in staffing are identified
  • Structured Walk-Through testing brings everyone together to walk through the plan as a table-top exercise to ensure the overall plan will work together against a specific scenario.
  • Simulation testing is similar to a structured walk-through with a scenario with some interruption to non-critical business activities as components testing is performed.
  • Parallel testing is actual testing of modules or a full application including relocation of operational personnel in the same way as needed for a full a full disaster but the applications are isolated with no interruption to a production service. The parallel testing is split into:
    • project testing – done in many cases before annual testing to ensure an application can be recovered successfully before an annual test. It can be performed where the application is new or a major change has taken place.
    • annual testing – performed on a regularly scheduled basis including multiple business processes but not necessarily all processes. Some processes might require testing on a less frequent basis where the processes are mature. Testing of applications might take place every other year to reduce the cost of testing.
  • Full-Interruption test where the real production applications are shutdown and moved to an alternative site. This not performed very often due to the interruption to the business.

The roadmap picture adds additional concepts:

  • Component testing involving a part of an application such as directory services or email services.
  • Module testing involves a collection of components that enables testing of one or more applications or business processes. As the level of testing increases a number of modules may be combined together to share the overhead of setting up the test and for annual testing that could involve a large part of a business environment.

To decided what testing is required many factors need to be considered such as:

  • complexity of application – more likely that more rigorous testing is required as the complexity increases.
  • risk of component failure – if some components are more likely to fail more rigorous testing may be required
  • time critical recovery – a step in the process such as the time to recover data might be need repeated testing to tune the timing of the process
  • business criticality – the application may have a greater impact to the continuity of the business processes if the not recovered successfully and may require more testing
  • appetite for business risk – the business may be more or less risk averse and willing to accept much greater/lower levels of risk

The key message is that testing recovery of business processes and applications need to be thought through to decide what level of testing needs to be performed. There is not one solution for all and it is unlikely just an annual test without additional prerequisite testing is sufficient. Annual testing normally takes place over a short time period to demonstrate a short recovery period and there will be insufficient time to spend time debugging each application during this test.

This entry was posted in Disaster Recovery and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s