Podcast: Google VP on impact of Kubernetes and open source communities
[Editor's note: This podcast originally aired on Dec. 5, 2017. Find more episodes on the STACK That podcast page.]
When you think about cloud computing, containers come to mind. In this episode of STACK That, Eric Brewer, vice president of infrastructure at Google and a professor at the University of California, Berkeley, joins show co-hosts Byron Reese of Gigaom and Mesosphere CEO Florian Leibert. Brewer talks about Kubernetes, why Google's donation of Kubernetes to the Cloud Native Computing Foundation was important for the industry, containers versus virtual machines, and a new product called Istio. His focus is on abstracting computing from the hardware. "If I had my druthers, people wouldn't know about the machines at all," he says. "I think there's reasons to know about them, including fault tolerance and a few other things. But really, I'd rather see developers focus on their APIs and services and not spend more than a few minutes thinking about the actual machines that underlie them."
Byron Reese: The growth of the Internet and mobile and IoT and all the rest has pushed the Internet and traditional architecture to its limits. We now ask of the Internet things that it was never designed to do. I mean, think about it. We all got really lucky with the Internet. No one said the world needed a way to connect billions of people and a hundred billion webpages and billions and billions more IoT devices and billions of mobile accounts and a huge assortment of cat videos. The people who built the Internet, they just wanted a network that could survive a nuclear attack. All of these new uses have required that we rethink architecture. We got the cloud, we got the edge cloud, we got open source, we got self-driving cars or we're getting them. Sensors are proliferating, and one interesting aspect of all of these new uses of the Internet is the need for scale. Often, things need to be ramped up and down. You may be able to get a thousand machines in the cloud in the blink of an eye to handle some surge or something like that, but how do you manage all of that?
Byron Reese, Gigaom
That has required a whole new host of tools, one of which is the open source Kubernetes, developed at Google but then donated to the Cloud Native Computing Foundation. Kubernetes is an orchestration layer that manages containerized apps, or in short, put simply, it is an operating system for the cloud. Today, we are excited to have as our guest Eric Brewer. He is the VP of infrastructure at Google and for the last 23 years has been a professor at UC Berkeley. Welcome to the show, Eric.
Eric Brewer: Glad to join you.
Eric Brewer, Google
Florian Leibert: Hi, Eric. Super-excited to have you here on the show. You have a super-impressive background. You've built and sold a company to Yahoo; you created the CAP Theorem, one of the big influences for database design; and now you lead infrastructure at Google. Can you tell us about your role at Google and how you first became involved in Kubernetes?
Florian Leibert, Mesosphere
Brewer: Happy to. So, my current role is actually essentially a senior most designer for computing at Google. That includes all kinds of computing, but the obvious one is cloud computing. In that role, I was essentially the executive sponsor and uber-tech lead for the space that includes Kubernetes, so my projects include Kubernetes but also app engine, VM-based computing in general, and now things like SDO and services as we move in that direction. So really my role in Kubernetes was arguing that we should focus on containers, that it's a natural way for Google to go, and that the product should be open source.
Reese: And did you get pushed back from people there or was there broad agreement on that last bit?
Brewer: There was not agreement about the open source bit at first, or even about should we start a whole new computing platform, because again, we already had app engine and infrastructure as a service. Adding a third thing sounds like a big distraction, at least at first.
Reese: Well, give us a little history there. When did you first identify the need for container orchestration, and was your work in distributed databases part of that? And how did you come to that view of the world?
Brewer: Well, that's a very interesting question, actually, because in my mind, the work we did at Inktomi in the '90s and at Berkeley was implicitly container-based cloud computing. It wasn't called cloud computing at the time, and the reason I say it's container-based is because it was really based on Unix processes. At the time, it was Solaris, and we did use processes as our structuring mechanism, and we did use APIs as they're being used now in Kubernetes, and we didn't use VMs. This is before the modern reincarnation of VMs led to the founding of VMware. So, that was roughly 1998, which by the way was the same year that Google was founded, and so all of Google's internal systems don't use virtual machines either. I've actually argued that the move to containers is actually a return to a higher level of traction based on processes and APIs, and that's actually the cloud that I want to use and that I want to create.
Reese: And how are we doing in that regard? Is this broadly taking hold, and is this a settled question that for the foreseeable future is how cloud architecture is going to work?
Brewer: I believe it is, and I think the reason for that is although VMs are incredibly useful and they're not going to disappear, they are really about moving a legacy workload from on-prem to essentially a similar server run by somebody else, whereas the move to containers is really about modernization of the stack and thinking about your application in terms of an abstract sea of services where the machines are not really your primary focal point. If I had my druthers, people wouldn't know about the machines at all. I think there's reasons to know about them, including fault tolerance and a few other things. But really, I'd rather see developers focus on their APIs and services and not spend more than a few minutes thinking about the actual machines that underlie them.
Leibert: So in some ways, Kubernetes really means the first step towards the end of virtual machines, right?
Brewer: It does, although it's virtual machines as an abstraction that you target. I think for a while at least, they're pretty useful as a security boundary, but that security boundary doesn't need to be visible to most developers.
Leibert: And if you think about the Kubernetes project, which now is I guess almost 3 years old, how has it grown and how has the adoption of Kubernetes changed?
Brewer: Well, it's hard to know how far back to go on this, but I would say joining your question with the previous question, Google has used Linux containers for more than 10 years and because of that, all of the internal systems at Google have been container-based from the beginning, and in fact, we had to make many improvements in Linux in its support for containers to be able to get the scale and utilization security that we wanted, particularly for performance isolation. So when I think of what is containers, I actually think of it as the modern containers, the merging of two concepts. The packaging the docker gives you, which is a route capturing your dependencies so you get repeatable deployments, and the performance isolation, which is the containers that Google has worked on internally, and many others that are part of Linux that give you performance isolation so you can put multiple containers on the same physical machines and have them interact reasonably.
Kubernetes basically is about orchestrating those containers into applications and services based on what we were doing internally on Borg, which by the way has roots for Mesos as well, and we really like how our interm systems at Google work and felt like it would be great if more people understood how these things work and can use it. That was some of the original purpose of Kubernetes, just to get that ease of use of container management. With that being said, Kubernetes is actually much nicer to use than Borg because Borg evolved over 10 years and has lots of legacy cruft in it. But I'd also add that there is a whole range of things that the community has added over the last three years that really make it actually better than Google could have done on its own.
Reese: In what way? That's a pretty interesting thought.
Brewer: Well, for example, I would say Google doesn't have inherently a whole lot of representative enterprise IT infrastructure. The way Google builds things is quite different than most enterprises, and because of that, I would say in many cases our internal intuition about how enterprise computing should work doesn't match how enterprises actually think about problems. But many people in the Kubernetes community, including most notably groups like Red Hat and CoreOS—but they're not alone—Cisco, many others, have actually brought to the table their view of enterprise computing and make sure that these things work in a way that fits with the general way enterprises want to work, and that's made Kubernetes much more mature and also useful for a wider range of workloads. I think that the biggest thing going on this year is it's not just stateless servers; it's now being a wide range of workloads, and that's an important process.
Reese: I keep having to turn back to this comment you made at the very beginning because I'm still trying to wrap my head around the idea of it. Nineteen years ago, this is the architecture that was used for HotBot, and at the time, were you like, "Oh, yeah, this is the way to go, this is really going to take off in the future," and all of that, or is it now only in retrospect you look back and go, "Oh, you know what? That's kind of what we did back then."
Brewer: Well, again, because there were no VMs to think about at the time, it was really just how do you build distributed systems using Unix processes and RPC and DCP. The answer is, that is how Inktomi was built. In fact, there's papers that talked exactly about building large-scale stateless applications this way. We don't use the word containers, and we don't use the word cloud, and it worked in part because, as with Google, all the code was trusted. The cloud has a big disadvantage. With cloud computing, you have to run untrusted code, and that's why you really need the security layer of virtual machines. So again, that's still there, but it's secondary in the Kubernetes world. That's not the thing you think about when you're deploying your applications. I think that's honestly just a strictly better model because you get to focus on a higher level fraction and have more leverage.
Leibert: So aside from the deep technical aspect of Kubernetes, Kubernetes also was a big shift for Google because it was an open source project, and an open source project comes with a community. When you donate a project to a foundation like the Cloud Native Computing Foundation, you lose a certain amount of control. Do you think Kubernetes would be where it is today if it wasn't donated to the CNCF?
Brewer: It would definitely not be where it is today if it was not donated. Going back a little bit, I would say Google hasn't historically done a lot of systems projects that had then made open source but has a long history of open source contributions to Linux, and of course it has whole open ecosystems around Android and Chrome and a huge number of things around tools like the C++ compiler and its evolution. So there's always been a strong support of open source, but you're right. This one is different. This is really core technology being released as open source when it could have been used as a proprietary advantage, and I was definitely one of the key voices arguing for that.
The argument basically comes down to the following, which is this is the right way to go for the industry, not just for Google, and the only way we're going to get there is if the whole industry participates. By making it not a Google thing and a broader thing, people can actually trust it. You can use the Kubernetes code and you know that Google cannot take it away from you. Google can't say you don't get to use it anymore. You can bet on it without risk, and if you want to move to a different cloud and use it there, that is OK. We'll take our chances that we can implement containers as well or better than anyone else, we have lots of services you'd like to use, but we're not going to make Kubernetes an advantage per se by itself. It's more important that everyone move to this model.
Leibert: And by now, the Kubernetes community is huge. It's probably, along with TensorFlow, the most successful open source project out of Google, right?
Brewer: Those are the top two by far, and in recent times, Google is not even majority of commits for Kubernetes anymore, which is a very good sign of health. There's a broad range of contributors to Kubernetes at this point, which is exactly what we needed and wanted. Also, I don't know if you've checked, but the rate of commits on Kubernetes is just crazy. I don't remember an open source project that has such a high rate of continuous commits improving the project every day.
Leibert: That's super-fascinating.
Reese: I'd like to take just a minute, actually, and do a shout-out to Hewlett Packard Enterprise. They are the people that bring you STACK That. They are HPE, the leading provider of the next-generation services and solutions that help enterprises and small businesses navigate the rapidly changing technology landscape just like we are discussing today. With the industry's most comprehensive portfolios spanning the cloud to data center to the Intelligent Edge, HPE helps customers around the world make their operations more efficient, more productive, and more secure. So to stay up to date on the latest in hybrid IT, Intelligent Edge, memory-driven computers, and more, visit HPE.com.
Eric, I do have a question for you, which is you made the point that it's great that we live in a world where you don't have to think about the hardware anymore, or maybe, like you said, you think about it as you're eating your Cheerios in the morning over breakfast but you don't give it a lot of thought and that you can really think about the API. When you look ahead a few years, what's the next step above that, where you've even abstracted a whole other layer away and gain that much more in productivity?
Brewer: Well, that's a lead into one of my other favorite topics and another open source project from Google called Istio, I-S-T-I-O. What Istio does that's really added to Kubernetes is that it allows you to decouple operations from the development of services. And what I mean by that is if you want to have a lot of services, and Google has I think more than 10,000. I can't name them all—I can't even believe there's that many, but that's what the stats say. There's a lot of kinds of cloud-native companies that have hundreds or a thousand services in their what you might think of as a single application, and when you have that many, you really want a framework that hides a lot of the operational pain.
What I mean by that is, for example, the way you do authentication on access control, the way you do telemetry, or more generally, observation. Can I tell which services are talking to which other services? Can I tell whether they're big or small flows? Things like traffic management: Can I send some traffic to a new version to see how it goes and roll that out progressively? Those are all operational things that really should be the same for all your services, but historically, some of those things end up in the code written by developers. So if you look at your source code and you have calls in it to do authentication, then it's going to be very hard to change the way you do authentication down the road. That authentication stuff, the way it's done in Google, is actually in the framework and not in the source code of the applications.
So when I say decouple, I mean take all the operational code, take it out of the services and put it into a framework where it can be changed without changing the services and, in fact, without even redeploying the services. The way that's done with Istio is it has a layer of envoy proxies. It's essentially a managed distributed proxy that goes under all your services simultaneously. If you want to play a policy about who can access the services, you can apply that policy at the proxy level, have it affect all services instantaneously and no change to any of the code written by developers. That decoupling is where we're really going. The containers are just the first step in that. I would say containers decouple CI from CD. You do the build process, the output of that is a container, the container is then what you deploy to get a repeatable deployment. That's a certain kind of decoupling.
Take these operational things out of the source code and into a proxy layer—that's another kind of decoupling that makes it easier to operate complex large-scale services. That actually, in some ways, is at least as important as Kubernetes, and I think it will have quite a big impact. In fact, in some ways, it's broader than Kubernetes because you can use these proxies even on legacy services running on VMs. It's actually easier to deploy the proxy layer than it is to rewrite your applications to be containerized.
Leibert: Interesting. And if you think about Kubernetes, STO, TensorFlow, all of these open source projects that Google has recently released, what's the enterprise adoption like right now? What are you seeing out there?
Brewer: Oh, it's pretty good, and I think for lots of reasons, but I think the main reason is historically, it's hard to move to the cloud and modernize your applications. You want to do both things, so you could move to the cloud first, and then once you're on the cloud on VMs, you could then try to modernize. Or you could modernize on-prem and be more service-oriented, and that will make it easier to move to the cloud in a cloud-native way. But what's going on with Kubernetes and Istio will actually accelerate this, is that you can mix and match what you modernize on-prem and you start mono-lodging on-prem using Kubernetes and Istio. All that stuff actually gives you a cloud-like experience on-prem, and you get the productivity value right now.
You don't have to wait until you move to the cloud to get some of the advantages, but at the same time if you're on Kubernetes on-prem, it's going to be easy to move those services to the cloud when you're ready. So it kind of decouples the timing of when you modernize versus when do you move to the cloud. Now you can do kind of in a fine-grain way different applications on-prem, in the cloud, legacy, lift and shift, or modernize first—whatever combination you like. And so we see enterprises that want to modernize using Kubernetes just doing it on-prem, and because it's open source, they can then be quite sure it will run the same way on-prem as it does on a future cloud, and that is hugely valuable.
Reese: You know you obviously spend your time thinking, imagining next year, and kind of skating the way the puck is going to be and to make sure we have tools to support it. In that introduction, where I talked about IoT growing and all of these additional things that are going to be leveraging the Internet and traffic growth and more users and all of that, what do you think is the first thing about our existing architecture that starts to crumble, that we're going to have to think, OK, what are we going to do when this happens?
Brewer: Well, depends how you define infrastructure. I still think core security has huge problems ranging from phishing attacks to DDoS from IoT devices. It's hard to build a secure system, and many people's dreams about what IoT could be, it's going to be limited in practice by what we know how to do safely and securely. I don't really think you can have a large-scale IoT system unless you have some notion of security built in and upgrade built in. Most devices out there aren't actually made to be upgraded, and that's a problem. That's not going to be at all easy to solve. That's probably the one I feel is pretty fundamentally hard to fix. The core things like billions of users in a service, I'm not worried about that at all.
Google has many such services and petabytes of data. All that stuff I feel like it's not easy, but we know how to do it. We're figuring out how to do things with relatively rapid evolution, even for large, live services. That's always historically been a challenge, but we do much better when we have real servers where we can actually know what code they're running, upgrade that code as we learn about security risks, and that's much harder as you go to smaller and smaller devices that connect less and less but you have no or little knowledge about their provenance and what they're actually intended to do.
Reese: Well, then, last question. Do you have any thoughts to share with us on how to solve that security...even a direction that people should even be thinking?
Brewer: Well, we've been using the Chrome browser as a way to explore that space. Chrome browser is very secure. It does a lot of auto-updates. It's checking certificates very directly now, and so it is a space that's ahead in general, but I think a lot of the things that it's doing we'll need all connected devices to do down the road. It's really all that stuff is worth copying.
Reese: All right, well, obviously Kubernetes has had such an impact, and I'm going to put on the spot flow. I actually don't know the answer to this. Does DCLS support it?
Leibert: Yeah, absolutely. Recently actually demonstrated on stage at MesosCon in LA, Tim Hawken was on stage and showed a really exciting news case of how to run vanilla Kubernetes on our technology stack—so super-exciting.
Reese: It's amazing. It's everywhere and so quickly, and so I want to thank you for taking the time, Eric. This has been really fascinating to just get a glimpse into how you think about this.
Brewer: My pleasure, and I think we have quite a few more years of very exciting stuff to go through. I really want to see how it turns out.
Leibert: Thank you, Eric.
Brewer: My pleasure.
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.