You need strong talent to achieve infrastructure as code
The cloud mantra of VMware's CEO is "ruthlessly automate." And that's the premise behind infrastructure as code, the process of managing and provisioning data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. At VMworld 2018, a panel of customers discussed how they are moving toward an automated environment. Here's what they've learned—much of it the hard way.
Hüseyin Dursun, VMware's vice president of engineering and the panel's moderator, set the stage. He pointed out that, while we all hear about automation and DevOps, in the end, the buzzwords can mean different things. We have to find the right definition for our requirements.
"'I spend a lot of time on a task, so I should write a program automating that task' is a good, noble goal," Dursun said. In theory, the automation works. But the reality, as an XKCD comic shows, is not a smooth curve of development, followed by free time as the manual task goes away. Rather, the automation process is a series of ups and downs. Coding, debugging, rethinking, and ongoing development gobble time in addition to the manual process the automation was trying to eliminate.
And, Dursun said, the much-touted philosophy of "move fast and break things" doesn't work. In an enterprise environment, the poster on the wall should say, "Move fast with a stable infrastructure."
"That's where infrastructure as code becomes crucial," Dursun said. Digital transformation is a journey—often a multiyear effort.
People, process, technology. Pick…three, actually
In any digital transformation, there are three components: people, process, and technology.
Panelist Chris Wahl is chief technologist at Rubrik, a data management and protection company, and the founding owner of the Vester Project, a community project providing configuration management of VMware environments. He said everywhere he's worked, there are hundreds of developers versus maybe 10 operations people: "IT services are traditionally seen as a bottleneck. We're looking to offer all the food groups in the data center: the servers, the compute, the storage, and all that kind of jazz." IT has to provide everything developers need to do their work, Wahl said, yet there's not a lot of communication between the two groups.
At Rubrik, Wahl tried to address the problem by teaching developers something about IT operations so they can understand what IT is trying to do for them and be more integrated into ops. He then had them come together as a team to decide on joint goals and "chunkify" the work into automated tasks so IT can deliver what is really needed for developers to be a force multiplier.
"A lot of that has been taking a look at traditional data center operations and giving it a developer eyeball so we could build it in a much more efficient manner," Wahl said. "The human element of operations should only really be invoked when there's an exception rather than the rule."
At RingCentral, a provider of unified communications as a service, cloud and security operations VP Ashu Varshney faced different challenges. When he joined the company, it was still a startup and preparing for the future. Stage one of its transition was virtualizing its legacy bare metal infrastructure, but that wasn't enough. At that point, RingCentral began automating virtual machine creation. That helped too, as did bringing in a post-configuration engine—Puppet, in RingCentral’s case—to generate massive automation.
"But in spite of all this, we felt that the promise that engineers could deliver a new release every six weeks was not being fulfilled," Varshney said. "It was very clear that all the developers wanted was an Amazon kind of experience: They wanted to have a portal; they didn't want a traditional IT deployment model."
And that's what was delivered, Varshney said: a self-service portal through which developers can spin up an environment based on RingCentral's catalog of internal and third-party products. All IT has to do is support the framework of infrastructure as a service.
VMware itself had a lot of work to do to get to an automated experience. Manoj Warrier, VMware’s senior director of IT, said the company used to take four to six weeks to deploy an environment for developers. By 2015, IT could deliver a tested environment over a weekend. But that was still too slow. Today, Warrier said, with microservices, Docker, and Kubernetes, deployment needs to be done with the click of a button.
"It has to be a pipeline," he said. "If you want to be smart IT, you have to say yes to business."
A culture of automation
The technical bits aren't the only challenges. Culture also plays a big role.
"When it comes to efficiency—especially with application development—if everyone owns the efficiency, then nobody owns it," said Wahl. Rubrik created a team that works with developers and operations to make sure there's ownership of efficiency. "The purpose was to make it clear from the executive level down that we are putting a very high bounty on efficiency," he said. "If you have 300 or 500 or 1,000 developers—or even 100—and you can make them 10 percent more efficient, you get the activity of 10 to 100 different developers from that 10-percentage-point change."
IT operations is traditionally a place to soak up knowledge, not a home for collaborators. So Rubrik placed members of the operations team with developers—he made them sit near one another—to ensure that everyone is aware of new code and new features, has a sense of ownership, and works toward one goal.
The biggest resistance to change comes from cultural issues, Varshney said. And an added challenge, he said, is having to deal with the current environment while trying to build a new framework: "We had to build the solutions of tomorrow while taking care of the problems of today. Yet, if you don't get the infrastructure of today, tomorrow doesn't matter."
To manage this conundrum, just as companies go to market with a minimum viable product and add features, Varshney started off building his infrastructure as a service with the vCenter API integration and kept on improving it with every iteration. And the impetus has to come from within, from the people who are dealing with issues every day, he added.
The whole transformation starts with leadership; you have to eliminate the blame game, added Warrier. "The whole mindset had to change. Dev and ops have to come together as a group to deliver common goals. When you click a button or when you provision a VM, it directly helps the business."
Naturally, tools play a big part in automation, but Dursun cautioned against what he calls "tool mania." That condition, he said, is marked by periods of excitement, euphoria, delusions, and overactivity for a device or implementation meant to carry out a particular function.
Dursun asked the panelists how they deal with the syndrome.
"It's natural that each team will pick tools that will solve its problems the best," Varshney said. "It doesn't make sense to standardize." Instead, he said he looks at the outputs of the various tools—alerts, notifications, and so forth—and combines the outputs in a team messaging platform.
Warrier, too, opts for aggregation and correlation of outputs. His big question was: Build or buy?
Wahl, on the other hand, reminded the audience of an old joke: When you have a problem in IT and you bring in a tool, now you have two problems in IT. His approach is to ask his team what the tool does and to identify the risks it introduces or the adverse effects it could have on operations. And to make sure those effects are addressed, he rotates developers into the on-call roster for the tool.
The final component in the mix is talent. "You need strong talent to achieve infrastructure as code," Dursun said. "But infrastructure, the plumbing, is not as attractive as the rest of the stack. So how do we attract talent?"
At VMware, Warrier said, there are programs to introduce hires to all facets of the operation. Existing staff are rotated into other areas—developers go into operations for a few months and vice versa. When Warrier hires, he doesn't say it's for dev or ops. He just positions the role as an engineer who will code, too.
"People tend to move because they're bored," Wahl noted. "One interesting benefit of infrastructure as code is it's hard to be bored when you're working on everything." At Rubrik, employees are encouraged to build cross-functional teams so they can learn from each other and bring the knowledge back into their own groups. As an added benefit, they may find something new that they love in the process.
Varshney's challenge is getting people closer to the product and the customer experience. Retention is primarily driven by the manager, he said. He agrees with Wahl that boredom and not being connected to the success of the company are reasons companies lose people.
What else do you need for infrastructure as code? Dursun put that question to the panel.
"For the infrastructure-as-code transformation, you need transformation leaders; it comes top down. That's the key. My leaders told me that you try to make your job redundant," Warrier said. "You become a service owner, where you're able to give your customer self-service. As soon as you start with that thought process, you're an infrastructure-as-code company."
Added Varshney, "For infrastructure as code, in automation and this kind of extreme automation, agility is second on the list in terms of the outcomes you may enjoy. The number one is accuracy. By having infrastructure as code, the accuracy that you are able to deliver to your end customer is unmatchable."
For Wahl, the biggest benefit is transparency. "If you contrast how we traditionally built IT, if I'm asked to build a database server, you get Chris' version of that server. When you start going with infrastructure as code, everything is an artifact," he said. "You get to see exactly how I would build that database server, and it's going to be the same experience every single time. And consistency is the killer of risk."
How can someone get started? Here’s a little light reading, recommended by Dursun: "The Phoenix Project," by Gene Kim, Kevin Behr, and George Spafford; "Accelerate," by Nicole Forsgren, Jez Humble, and Gene Kim; and O'Reilly's trio "Site Reliability Engineering" and "The Site Reliability Workbook," both edited by Betsy Beyer et al., and "Infrastructure as Code," by Kief Morris.
Infrastructure as code: Lessons for leaders
- IT and operations must work together and understand each other's realms if they're to successfully manage infrastructure as code. It's a partnership.
- Leadership matters. Eliminate the blame game and try to make your job redundant.
- You need strong talent to achieve infrastructure as code. Nurture it.
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.