Technology Now

The must-read IT business eNewsletter
Subscribe

Are you ready for your next data center incident?

August 2016

Are you ready for your next data center incident?

It’s not a question of if an incident will occur, it’s a question of when. It might be something spectacular, an earthquake or fire or severe weather event, or it might be something as mundane as a broken pipe. The end result is an incident in your data center, an outage that brings mission-critical systems offline.

These things happen more often than you might think. Ninety-five percent of enterprises have experienced at least one unplanned data center outage in the past 24 months. That’s not just an individual system, but an entire data center offline. [1] The average financial services business experienced 1.8 complete data center outages over the past two years, while in healthcare, the average is three outages in that same period. [2]

One major airline outage story

Earlier this year, a major airline was hit by a power outage that took down its third-party data center. The outage took down systems supporting check-in and gate operations, causing a cascade of problems that affected thousands of travelers, including delayed and cancelled flights. It took three hours to get the airline’s reservation service back online and over eight hours to fully restore all of the airline’s online services. [3]

Could this outage have been prevented? The answer is, “Yes.” A disaster-tolerant architecture could have prevented the loss of systems and corresponding flight delays and cancellations, but that solution comes at a cost. In this case, that cost would be maintaining a failover solution at an alternate site.

Business continuity is a balancing act

Business continuity is a balancing act between the risk of downtime-related loss and the cost of providing specific recovery time and recovery point objectives. Downtime can be very expensive. IDC estimates the mean cost of downtime at roughly $2.5M per incident. [4] That figure does not include the additional costs from damaged reputation, lost customer confidence, or regulatory exposure.

So what do you do? You insure against the loss by investing in the right business continuity solution, one that protects against system failures and also entire data center and regional failures.

From a practical perspective, business continuity requires examination (and regular re-examination) of the tolerable recovery time (RTO) and recovery point (RPO) objectives for critical systems, that is, the amount of time a system can be unavailable and the acceptable level of data loss. This is weighed against the cost of designing solutions to reach those objectives. This type of examination often reveals two things:
  1. Many systems are more vital to your business than you think. For example, you might not view your VOIP or customer service systems as mission critical, but what will the impact be if customers can’t reach you? How long can you really afford to be without those systems before your customer calls another supplier?
  2. Your current business continuity capability is inadequate to meet recovery objectives, meaning the customer experience may be affected in ways you haven’t anticipated or planned for previously.
Remember, it’s not if, it’s when

In the airline example above, systems were offline for hours and thousands of customers were inconvenienced. We’ll probably never know if that was the result of a conscious decision to reduce costs or a failure to plan. What will your customers experience the next time one of your data centers is offline? Are you prepared?

For best practices on evaluating your business continuity needs today and the steps you should take to protect your mission critical systems, download the guide Delivering business continuity for vital applications from our Data Center Modernization website.

[1] “Fingers Crossed? Or What Is Your Business Continuity Plan for the Inevitable,” Gravic, Inc., 2015
[2] “Fingers Crossed? Or What Is Your Business Continuity Plan for the Inevitable,” Gravic, Inc., 2015 [3] “Verizon datacenter failure causes JetBlue air travel delays,” David Chernicoff, 15 Jan. 2016, Datacenter Dynamics [4] Mean cost of downtime is $1.7M/hour and the average incident is 90 minutes, with some incidents approaching $10M/hour – “High-Value Business Applications on x86: The Need for True Fault-Tolerant Systems”, Peter Rutten, IDC, May 2015
Subscribe

Popular tags

Most read articles

Technology Now

Contact Us
Search archive
Customize your content