Podcast: How Airbnb is pushing the limits of machine learning
[Editor's note: This podcast originally aired on October 31, 2017. Find more episodes on the STACK That podcast page.]
Airbnb has emerged as one of the largest travel companies, transforming travel experiences through the sharing economy—and artificial intelligence. It is estimated that 16.9 percent of U.S. Internet users (or 36.8 million people) will use their Airbnb account at least once. It makes you wonder what Airbnb does with all that data.
Byron Reese, CEO of Gigaom, and Florian Leibert, co-founder and CEO of Mesosphere, talked with Airbnb VP of engineering Mike Curtis to learn more. Airbnb has been building sophisticated AI using algorithms and key metrics to develop a virtual matchmaker, introducing the right guests to the right hosts so both will have a great experience. This not only keeps customers returning for more, but also is the start of a larger project using data and learning to make travel planning easier.
Bryon Reese: Hello, everyone. I am Byron Reese with Gigaom, and you are listening to STACK That, a podcast brought to you by Hewlett Packard Enterprise. During this series, we are going to explore industry trends through thought-provoking discussions with some of the best in their field, to help you leverage the latest technologies for your own benefit and that of your business.
Byron Reese, Gigaom
I'm here today with my co-host, Florian Leibert, who's the co-founder and CEO of Mesosphere, which makes DC/OS, the most flexible platform for containerized data-intensive applications.
Today, we're going to talk about how Airbnb uses machine learning to transform the travel industry. The term artificial intelligence is about 60 years old, and as a science, it was slow getting going. In part this is because the initial hope had been that there would be a few simple rules that explained intelligence, the way a few simple rules explain physics or magnetism. This, of course, turned out not to be the case. But what has finally, and in a big way, vaulted artificial intelligence forward is machine learning.
It says don't try to figure out how a system should work in some top-down hierarchical fashion; instead, just analyze the data. That simple fact, combined with faster machines and more data, really is the entire story of artificial intelligence through today. Because machine learning is so young, we don't really know what its limits are. But one person who is pushing those limits of machine learning is Mike Curtis. He is the VP of engineering at Airbnb. Welcome to the show, Mike.
Mike Curtis: Thanks very much for having me.
Mike Curtis, Airbnb
Florian Leibert: Welcome, Mike.
Florian Leibert, Mesosphere
Can you give us a brief update on Airbnb and how you use machine learning to revolutionize the travel industry?
Curtis: Sure, well, that's a very big question. Thanks again to both of you for having me. You know Airbnb is at an exciting time right now. The business is doing really well, and we are getting the opportunity to really apply technology in new and exciting ways to making travel a better experience for people around the world. The topic is on machine learning, so let me talk a bit about our history with machine learning and what we're doing with it now.
Airbnb has been using machine learning in our products for years, from everything from how we recommend prices to our hosts [and] understanding the value of their space to how we do search ranking. And that really comes down to matching between guests and host as well as to all the things we do to detect fraud and keep our community safe. So machine learning is something that you can find at all aspects of our product experience and technology stack. It's very central to what we do and in everywhere we hope to build in a lot more capability and capacity going forward.
Reese: You know, it's interesting. I was kicking around the site and just searched for machine learning, and there's content all over the place about it. I mean, you're really open about sharing your experiences and what you're doing with the technology. I'm curious: Do you have a good way, do you have good ways, to kind of measure the impact on your business, or do you just kind of look at it holistically and say, "This is better"?
Curtis: Oh, no. Absolutely. We do measure the impact of pretty much everything that we do. At any given time, we have hundreds or possibly even thousands of experiments running on the product experience, and that could be everything from a UI treatment to a different machine learn model. Say, for example, for searching ranking, and the way that we'll do it is, as you predict, some number of our visitors are exposed to our newest machine learning model. In the example of search, that might give them a different search ranking and then we'll evaluate: How does that treatment group perform?
Are they able to be more successful with what they're trying to do on Airbnb [compared with] folks who were not exposed to the new model? And then we look at that in terms of its translation into our business metrics, looking both at volume and the number of bookings created, but also even being able to look downstream at the quality of the experience that people are getting.
One of the things I'm very excited about with what we're doing with search now is trying, even all the way through to a review of a stay, to tie back into the model that we use to determine the search ranking such that we can come up with a predictive model that looks at how likely it is this person looking at this listing is going to give a good star rating if they actually go stay in it. So, some really exciting things we are able to bridge, a little bit of the offline experience to the digital experience. So, yeah.
Reese: So, you mean like the metric. So, somebody says, "I want a four bedroom in this part and this and this and this," and then they are shown some searches, they click on some, they make a reservation, they go, they stay, they have their vacation, they come back, they leave their review, and then you're tying back that review to that original query they did?
Curtis: That's right. We're able to start fully closing the loop. So that way we can have models that are starting to be trained on prediction of, again, how likely is it that this is going to be a good match for this person such that it will lead to a good review? And that's one of the signals that we use, so we obviously use lots and lots of signals for something like search ranking, and the objective of course is ... and one of the things that makes these challenges so fascinating at Airbnb is every single guest who travels on Airbnb is unique and distinct and has a different thing that they want to get out of travel, and every single host offers a unique experience so the technology of matching people, the right guest to the right host or the right experience, is really what it comes down to.
Reese: Well, I just have one more quick thing about metrics, which is, are there behaviors that you can't qualitatively say this was a better experience than this? Like there's no type? And if that is the case, how does machine learning play a role in that? I mean, you have to have a success metric for it to train. Is that correct?
Curtis: Well, yeah, I mean, in any machine learned model or any AI system, you need to have some kind of objective function that it's trying to optimize for. We look at things that we measure our models against. But if you can't tell or you don't have a good measure of the quality of experience, there may be a fallback on some other measures, like were they able to be more successful completing a booking, right? Were they able to get through it in a shorter amount of time? Those could be examples of things we would look at and could train the models on.
Leibert: Mike, so machine learning is super-challenging in itself with having to have sufficient storage available, doing data processing to scale, and also the choice of the ecosystem of tools that you can use in order to solve machine learning problems. In your opinion, what were the biggest technical challenges for Airbnb in machine learning but also, more broadly since you've taken over engineering.
Curtis: Sure, yeah. You know for reference, I joined Airbnb about five years ago now, a little less than five years ago. And I think that the technical challenge at that time was keeping up with the explosive growth. The company was taking off, the amount of traffic and just the number of connections, how much we had to handle on our databases, etc., was really a big scaling challenge when I look back all those years ago.
And the good news is, you know, I'm able to have that reflection back in hindsight because we got ahead of a lot of the scale issues that we had, which is very exciting because now we can be looking much more towards the future of where technology can take us as opposed to just keeping up. When I think about the machine learning aspect, what we went through and the technical challenge we went through, it is one that I think any company is going to go through that wants to be able to apply machine learning and data essentially to build better products, and that is the early investment in collection and management of large amounts of data.
One thing is certain: You can't do machine learning without data. So you should start with the data that you're going to need even before getting going with the actual ML applications. Where our challenges over the years were really about making sure that we had a consistent way to instrument everything in the product, making sure that we actually have everything instrumented, such that we're collecting the signal that we need, having a consistent and unified source of truth for all the data that we collect and process at the company and then building a whole bunch of great tools to be able to leverage that data for insight as well as to be able to leverage it be build data products. Things like our pricing prediction system and search rank.
And a couple of things that we've tried to do is when we come up with something that we feel like is industry leading in terms of the tool or a process that we use for managing large amounts of data, which we've gotten pretty good at over the years, we've been eager to push them to open source and really make them part of the community. So, you've seen tools like Superset and Airflow come out, and it started getting pretty broad; the industry had options. So, it's something that we're pretty proud of and I think is representative of one of the biggest technical challenges we've faced over the years.
Leibert: Yeah, just to double-click on something here, you mentioned data a number of times and data collection being challenging.
With 4.4 billion users of the mobile Internet and annual data growth of 40 percent, are you afraid that your current tools might not hold up? Or is this something that you think about?
Curtis: Well, of course we think about it. I think when you're at the stage that we're at and are anticipating as much growth as we are, and in terms of usage and data volumes, we always have to be thinking at least a couple of years in advance to what do we expect to come out of left field the quickest in that period of time and making sure that we're getting in front of it and building or integrating with whatever the next generation of our data platforms are going to be.
I think that there's some tremendous technology that has been made available on open source and even entire companies that have been started to help people manage the vast amount of data that's out there, so that I think there will be a solution. But certainly any company that's dealing in this space and feels that things like machine learning … and AI technologies in general are central to what they do need to be thinking multiple years in advance, to make sure that they're going to build to keep up with the data and compute volume that's needed to keep pace with the data growth.
Reese: So, I heard a lot of advice that you just offered: invest early in data acquisition, build tool sets, think years in advance, and I love the phrase "a consistent source of truth." But that's kind of the Airbnb formula. If you were to advise another enterprise that maybe isn't quite as far along yet, what would be kind of some of the lessons you've learned along the way that you would pass on, or is that really it? Invest in data, build your tools, figure out what truth looks like, use open source, and all of that?
Curtis: A lot of those pieces of advice I think do ring true for just about any company or advice that I would offer for any company that's getting into this space. If I had to pick one that I think is critically important it is making sure that you have a consistent and well-reasoned-through representation of your metrics and understanding of your metrics. So, when I talk about a single source of truth, we have all of our data at Airbnb housed in a unified warehouse with a consistent set of metric definitions that the entire company uses.
So, no matter what department or function or whatever you're from, if you're querying our data warehouse and you're asking for something like average star rating on trips in a given country, or number of bookings or any number—you can imagine the thousands of things that we can query on—there's one definition of that metric, and that is documented.
We've spent a lot of time getting to that point, and I think fragmentation and how metrics are represented in a data warehouse is something that companies often ... you see a lot of fragmentation in that area, which can lead to a huge amount of confusion and efficiency wasted. So, that's an area where I would really advise people early on to think about metrics definition and creating that source of truth.
Leibert: Mike, what's the next challenge that you plan to use ML for?
Curtis: Yeah, I'm so excited for the future here. I think we've got some really exciting stuff that we're thinking about and working on, and maybe the way I'll frame it is, think about how difficult it is today to plan a trip somewhere. You have to visit all these different apps, you have to visit all these different sites, you have talk to people, you have to do all these things to try to figure out where am I going to go, how am I going to get there, what am I going to do?
And we think that it should be so much easier to plan and create a trip that's going to be perfect for you. So what that means is that Airbnb has to sort of go from offering a few services of the trip to being able to offer the end-to-end trip. And if we're able to offer the end-to-end trip, everything from the logistics of how you'll get there to where you're going to stay to the types of experiences that you're going to have, where are you going to eat, all that stuff, we have to start thinking in terms of end-to-end trip itineraries.
Now I talked about that matching challenge before, between a guest and a host, matching the right guest to the right host. Now imagine the explosion of possibilities that could go into creating a perfect end-to-end trip itinerary for you and having that be personalized just right for you. We think that that's going to be a huge area that we're exploring in machine learning, AI, everything else, to create that ideal personalization experience. I could go into a lot more reasons as to why that's complicated, like availability and supply and making times line up, and everything else. There's a lot there, but that's, I think, another frontier that we're going to be exploring with what we're doing with AI, and I'm incredibly excited about.
If you let me offer one other variant that I think is cool. As you think about how we engage with mobile apps and websites or whatever right now, and then the significant advances that there have been and things like conversational AI and being able to engage with something by talking to it, you think about what Alexa and Siri and these other technologies that are similar, that space I think is advancing really fast as well. And considering we go from an online experience into an offline real-world experience, I think us building our capacity and things like conversational AI is another area that I'm very excited to tackle.
Reese: Alright, before we continue, I want to take a moment and do a shout-out to Hewlett Packard Enterprise, who are the people who bring you STACK That. HPE is the leading provider of the next-generation services and solutions that help enterprises and small businesses navigate a rapidly changing technology landscape like the one we're discussing today.
With the industry's most comprehensive portfolio, spanning the cloud to the data center to the Intelligent Edge, HPE helps customers around the world make their operations more efficient, more productive, and more secure. Stay up to date with the latest in hybrid IT, Intelligent Edge, memory-driven computing, and more by visiting HPE.com.
I'm curious, Mike, did your time at Airbnb—you said you'd been there five years—did that overlap with when [Florian] was there?
Curtis: Flo and I, I think we overlapped for about three months, right at the beginning of my tenure. I joined, I guess, about four years and nine months ago in February, so Flo, I think you were at Airbnb for about three months in the beginning of my stay. Is that right?
Leibert: Yeah, I joined in December 2011. Yeah, you're right, Mike. I left about three months after you started, and I think that was also just around the time when we introduced some of the microservices and more modern search infrastructure, so it was a really exciting time.
Curtis: That's right.
Leibert: Yeah, actually, Mike, I have another question for you. Talking about open source, Airflow is a really exciting open source project that has a tremendous amount of traction out there. A lot of companies are using it. On GetUp!, it's one of the more prolific projects.
How important is open source for Airbnb?
Curtis: Yeah, it's something that we consider extremely important to what we do. ... My general philosophy is I want to create things and spend our engineering cycles on things that I think are unique to Airbnb or things that are solutions to problems that we need to create because there aren't necessarily ones out there that we can use ourselves, and Airflow and Superset were two good examples where we built homegrown tools for Airbnb where we felt like they became best in class for what was out there.
And at that moment we ask ourselves a couple of questions. Number one: Do we have something important that we can contribute to the community here that may be better than what else is available. And number two: Are we committed to really maintaining this and keeping it up to date in the open source world?
And the answer to both of those questions for Airflow and Superset was yes, so we decided to open source the project. We have continued to maintain them in open source, and internally, we use the open source versions as well, which is another great forcing function to make sure that they're maintained at the highest level, even when they're externally available to open source.
And part of the reason that we have this philosophy is because we rely so heavily on open source technology internally as well. If there's a great proven open source technology out there, we would much rather use that and bring it in to use in our internal stack than reinvent the wheel. And so, as big consumers of open source, we also want to be big contributors.
Leibert: Great. And could you give us just a little bit of background of what Airflow and Superset do?
Curtis: Yeah. Airflow is essentially a tool for managing data pipelines. It does that in a very intuitive way and is able to scale to very large data volumes and a very large number of pipelines, which I think is why it's been very well received by the community. Superset is a tool for doing essentially, like, deriving insight from data. It allows you to easily and intuitively query very large datasets, be able to build things like dashboards, billed recurring and take queries, and all the kind of stuff you'd expect from a great insight analysis tool.
Leibert: So, Mike, you're at the forefront of technology. You were at one of the world's leading companies when it comes to technology.
What are you excited about when it comes to technology?
Curtis: Yeah, so many things. I talk a little bit about being excited about being able to apply AI in a lot more ways to our business, talk a little bit about things like being able to personalize perfect trip itineraries, also being able to think more about conversational AI and the way people engage with our digital product, both when they are planning a trip and also when they're out there in the world. That stuff I think is super-cool and just really excited to keep exploring it.
I think one of the themes, if I zoom out a little bit and look at the technology industry sort of more broadly, one of the things that I'm excited about with Airbnb is the way that we're applying AI technology to empower people. And if you look at sort of the world at large right now, there's this big, sort of fear of automation of work, and it's a well-placed fear because automation technology is improving so dramatically right now that many jobs are going to be displaced.
Now we know, sort of, from the history of the automation field that jobs being replaced does not necessarily mean the people lose economic opportunity. In fact, in most cases, it has meant more economic opportunity as a whole. But it does mean that the need for our work changes for people.
And so one of the things that makes me excited about the work that we're doing at Airbnb is that we're using AI technology to actually create economic opportunity for people. People are going to be looking for new sources of income, new ways to make ends meet. Imagine hosting on Airbnb either a home or an experience or something else on Airbnb as a new form of economic opportunity for people that can help them make ends meet. And so I get really excited about not using AI to replace people but instead using AI as a mechanism to empower people.
So, if I zoom way out, that's some of the stuff I'm most excited about, what's happening in technology, and I'm hoping that more and more technology companies will adopt that mindset and will be thinking about how can we use this AI technology that we're building for the betterment of people and not just for business profits or business growth.
Reese: Alright, well, that is a great place to leave it, and we're out of time. So, Mike, I want to say thank you for taking the time, for finding time for us to talk about this fascinating topic.
Curtis: Well, thanks very much to you both. Thanks so much, Bryon, for hosting, and Flo, it's great to catch up again and after all the great things that you've done.
Leibert: Thank you so much, Mike. Likewise, great catching up!
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.