Your zombie server action plan
They’re sitting around doing absolutely nothing all day long, yet they could be costing your enterprise millions of dollars each year.
Zombie servers, as they’ve become known colloquially, are a long-standing problem in the IT industry. The term refers to servers that are physically running but not performing any useful functions. While a dormant server or two in your data center might not seem like a serious issue, a large number of them together can become a massive inefficiency—both in terms of the money lost and energy used. And they can even compromise your network’s security.
The issue of zombie servers first came to light in 2008, when the McKinsey Global Institute released a report that found up to 30 percent of the servers in data centers around the world were functionally “dead.” The Uptime Institute followed up with similar findings in 2012, and more studies since then have confirmed the prevalence and cost of unused servers.
Most notably, in 2015, Jonathan Koomey, a research fellow at Stanford University, and Jon Taylor, a partner at consulting firm Anthesis Group, estimated the aggregate cost of these comatose servers to be $30 billion in unused data center capital globally (assuming an average server cost of $3,000 and ignoring infrastructure operating costs). Koomey and Taylor later found that while enterprises have made some headway in reducing the 30 percent of servers left unused, a significant number of virtualized machines are also idle, showing the same management issues that apply to “real” servers also apply to virtual machines. And what's worse, according to Koomey and the Uptime Institute, is the data from 2015 is still accurate, despite remediation efforts.
“There has been a lot of talk about comatose servers for about a decade now, and it really comes down to a management issue,” says Scott Killian, vice president of efficient IT programs at the Uptime Institute.
Killian notes that IT managers are often overly focused on launching the next iteration of hardware, and because of the speed of IT these days, they tend to leave used servers in their wake. “They are upgrading at such a rapid rate that they buy additional capacity without cleaning up what is left behind,” he notes.
Other management failures can lead to excess servers, says Killian. One example is too much concern about ensuring the enterprise has enough extra capacity just in case a new product takes off. Or, a line of business launches its own capacity for a short time but doesn’t decommission the servers, leaving them running but idle and difficult to identify.
Yet finding and eliminating these servers can lead to significant capital and operational savings. Owning less hardware and other infrastructure decreases your capital expenditures, while investing less in the operation of those assets (the ongoing costs of facility staff, software licenses, maintenance contracts, power, licenses, and cooling) decreases your operating expenditures. Eliminating these servers also lessens the possibility of unforeseen security problems, another potential cost. After all, an idle server that’s not receiving software patches or security updates can be an ideal place for hackers to enter a company’s network.
For example, if a data center finds 20 percent (or, say, 2,000) of its servers comatose and decides to decommission 1,000 of them, the power consumption for those servers and their associated infrastructure would be reduced by 4.8 million kilowatt hours, and the data center’s power cost would be reduced by $528,263 over a one-year period, plus an additional $143,770 from additional savings for reduced hardware maintenance costs or asset resale and scrap costs, according to research from the Uptime Institute.
If the same data center were to decommission an additional 1,000 servers the following year, its savings would reach $1.34 million by the end of the second year. Five years on, and with another 2,000 comatose servers decommissioned, the company’s savings would amount to $4.02 million. If those servers hadn’t been decommissioned, the $4.02 million savings would amount to a needless tax on the organization.
The financial value of rooting out deadbeat servers is clear. But the savings estimates may have another benefit, notes Killian: They can be used to make a strong argument to get the executive buy-in required to start looking at the issue, he says. Often, the line of responsibility for an idle server isn’t clear, so to successfully track down and remove these unused servers, it’s important to develop a strong action plan, Killian adds.
Solving the zombie server problem
One reason IT organizations have not made the effort to resolve their zombie server problem, despite the clear business and financial benefits, is because they are unsure how to tackle the complicated challenge. Thankfully, The Green Grid, a nonprofit consortium collaborating to improve the resource efficiency of data centers, has published a white paper titled “Solving the Costly Zombie Server Problem,” which provides a step-by-step guide to eradicating and preventing zombies.
John Frey, one of the authors of the paper, has extensive experience working with large corporations to drive increased efficiency into their IT environments, while taking advantage of the associated cost, cybersecurity, and sustainability benefits such efforts provide. He has found that the most successful zombie eradication initiatives involve a clear quantification of the OpEx and CapEx savings expected, as well as the active participation of the associated stakeholders, including IT, security, real estate and facilities, finance, and sustainability organizations.
Key suggestions from the Uptime Institute and The Green Grid paper include:
Win the buy-in of senior executives: Executive support is critical to an effective plan to decommission servers. You need the backing of someone at a high level to complete the task, so before starting the conversation with your CTO and CIO, Killian recommends doing your homework so you have a strong business justification to back up your plan.
It’s also worth identifying the stakeholders who can provide the required resources (staffing and access to asset management systems), and appoint a project manager who works with internal and external customers to hold them accountable.
Put together an effective detection plan. Start with a basic audit to identify idle servers. First, walk through your data center and do an inventory check. You can do basic things, such as looking to see if a server is connected or has a blinking light. With so many servers in a data center, this is not uncommon.
You can also track the age of servers. Anything older than four years is suspect and could indicate that the server has been idle for some time (it can probably be decommissioned for a more energy-efficient model). Mapping network traffic is another good way to find these servers. It can help you determine which servers are optimized. A physical audit is good, but you can’t do that every day, so using tracking tools is a good alternative.
As you’re removing and shutting off servers, ensure you avoid disrupting business processes. A way to do this is to assume every server is assigned to an owner. But many times, a server isn’t assigned because a person may have left the company or the assignment step wasn’t taken. This is an opportunity to let everyone know you will disconnect the servers, and you can then see who responds. Working with the executive team, you can establish a system where if no one contacts you for two weeks, it’s OK to remove it. If a server is still part of the business workflow, you can always migrate its tasks to a different box.
Formulate a plan so servers don’t go dormant again. This may include documenting incoming servers to ensure that any new hardware is received and processed by the data center operations team, including identification of the owner and system administrator.
Another good approach is to use tools to gather key data on server utilization. Most of the big manufacturers have these tools available, and they may already be part of your enterprise license agreement.
To avoid seeing more zombie servers, it’s worth creating a new organizational chart that clearly describes the responsibility for servers in the organization so that future server access is not made without IT’s involvement.
The Uptime Institute has more guidelines for starting your own decommissioning program on its website.
Solving the costly zombie server problem (white paper)
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.