How to get started with DataOps
Data operations are a messy business. Data quality is inhibited by siloed, complex data pipelines. A lack of collaboration across data functions stymies coordination and efficiency. Manual and ungoverned processes for data delivery across the supply chain can lead to compromised analytics processes. The analytics cycle time takes a hit, and product delivery suffers.
You’re used to hearing how DevOps can improve the relationship between operations and software development. The next transformative practice, DataOps, aims to build aerodynamic data pipelines. The DataOps methodology provides a pathway to make data operations work much more efficiently and effectively, yielding all-around confidence in data analytics.
Here’s a concise way to define DataOps, courtesy of Matt Aslett, research vice president at 451 Research: The DataOps methodology brings some of the DevOps agile development and deployment practices to data processing and integration pipelines.
“While application development and release cycles have accelerated, the same cannot be said of data integration and processing pipelines or database release cycles, leading to the database and data provisioning potentially becoming a brake on application development and business agility,” Aslett says. “DataOps is, therefore, about implementing an iterative lifecycle for the provisioning of data, databases, and data integration pipelines, in order to support self-service agile analytics, among other things.”
The Wild West of data development
Good stuff. But how can you as an IT or data leader help your organization do all that? Research now being conducted suggests why enterprises need help pulling together a DataOps program. Preliminary results from a survey being conducted by Eckerson Group show that 44 percent of 150 respondents currently don’t have any DataOps work going on within their organization, says Wayne Eckerson, president of the firm. The final report will be published at the end of May.
That adoption rate has to change. “Compared to software engineering, the world of data development is still the Wild West,” says Eckerson, “And it’s only exacerbated by the whole self-service movement,” where analysts can wind up creating conflicting and error-filled datasets. More people are touching more data and doing so more often, but, he says, “the level of data consistency and the level of data consumption that delivers business value is going down in most organizations.”
Wyatt Earp helped bring law and order to the American frontier in the late 1800s. Now it’s your turn to tell a new tale of conquering the Wild West—the one about how you helped bring law and order to the data analytics lifecycle.
Get down the basics
First, get out of your head any notion that DataOps is just DevOps applied to data.
However, the two concepts do connect, so it’s important to understand the baseline concepts.
If your company has dipped its toe into the waters of both agile and DevOps, you have a head start. Those experiences can give everyone a better perspective about bringing the development and deployment practices to data processing and integration pipelines.
Start by reviewing (or re-reviewing) "The Manifesto for Agile Software Development," which debuted in 2001, and "The DataOps Manifesto," published a couple of years ago. The latter builds on some aspects of the former. Both share principles of satisfying customers though continuous delivery of software; promoting daily communications among business people, analytics teams, and operations; and relying on self-organizing teams to achieve the best analytics insights, algorithms, and architectures.
The principles of the agile manifesto can help you adapt to DataOps. For example, work on the goal of improving cycle times in the context of data management. Minimize the time and effort it takes to turn a customer idea into an analytics process, create it in development, release it as a repeatable production process, and refactor and reuse that product.
If you’re looking for a broadly recognized DevOps manifesto to which the entire industry adheres, you won’t find one—at least not yet. But agile, the DevOps methodology, and lean manufacturing all apply to data analytics. “Agile delivers analytics much more quickly and robustly than waterfall,” says Chris Bergh, CEO of DataOps, the consultancy and platform provider at DataKitchen, where "The DataOps Manifesto" was born. “It’s nearly impossible to transition fully to agile or effectively manage operations without continuous deployment—that is, DevOps.”
Data analytics, however, differs from software development in that it bears the responsibility for data operations. It’s less software engineering and more like manufacturing, Bergh says.
In data analytics, data moves through a series of steps. Each step takes input from another and creates output for the next one, to ultimately exit as a report, model, or visualization. “Lean manufacturing streamlines the data pipeline and ensures quality by testing data as it flows through the pipeline,” Bergh says. Testing of inputs, outputs, and business logic applied at each stage of the data analytics pipeline introduces a systemic approach to mitigating risks. So poor quality data shouldn’t ever reach critical analytics processes.
There’s no “Kumbaya” moment in DataOps. But you do need to cultivate bonds and coordinate activities among parties that play direct roles in the DataOps mix.
DataOps focuses on establishing close collaboration among separate teams. The DataOps approach implies that data analysts, data engineers, and data scientists put their efforts into the development process simultaneously, with each team members’ role aligned to carry out their part of the work, says Alex Bekker, head of the data analytics department at ScienceSoft, an IT consultancy and software development company. As an example, data engineers are the people who build data pipelines; it should not be the responsibility of data scientists.
Regardless of individual function, a DataOps team needs to keep in mind that its product is a dependency for another team. “The day-to-day existence of a data engineer working on a master data management platform is quite different than that of a data analyst working in Tableau,” Bergh says. Distinct teams tend to view the world through the lens of the specific tools they use, but they need to think outside that box, such as considering how another team might reuse the data, artifacts, or code they produce. Instead of allowing tools and technical integrations to create organizational silos, he says, DataOps uses automated orchestration, testing, and reporting as communication vehicles among data engineers, scientists, analysts, and users.
That doesn’t mean you have to alert everyone who can benefit from more agile approaches to data management that they’re all part of something called DataOps, Aslett says. (You don’t want to hear the groans, “What? I’ve got to be part of yet another data initiative?”) But clearly, data operators and senior IT decision-makers must fully understand what you are trying to bring into the organization and why.
Manage the processes, take on the tools
To motivate the business to embrace the DataOps task coordination and communications framework, you need to provide guidance. “DataOps tries to apply some standard processes for creating new applications and changing existing ones,” says Eckerson. “What are the right processes? And once we create and change them, how do we automate what we built so that it keeps working and doesn’t break?”
Adopting DataOps requires that people understand how to implement, automate, and monitor well-defined processes. These workflows encompass building, changing, testing, deploying, running, and tracking new and modified functionality for data pipelines, according to Eckerson.
You need to monitor all the points of a data pipeline, from data ingestion to engineering to analytics. You also need to build tests during the development process and pair them up with monitoring to make sure content is delivered properly. “That way we can identify if something has gone awry before it gets to the user community, and that way we can keep these pipelines flowing instead of breaking,” he says.
DevOps can and has created guidelines that help DataOps. Software engineering practices, such as a repository for check in/check out, version control, and continuous integration and continuous incremental deployment, allow data developers to work in parallel more efficiently. “All those things have been borrowed from DevOps to accelerate the process of pushing new things out to the business,” Eckerson says.
You can use any number of technologies to facilitate DataOps adoption. But, Aslett says, whatever you use shouldn’t require wholesale changes to the underlying data platforms or user-facing analytics tools. Aslett points to a growing market of vendors that can help with fulfilling your requirements.
DataOps has a host of specialist platforms. Among them are tools for automatically orchestrating pipelines for data operations and deploying new analytics, automating and monitoring data processing pipelines, and integrating data across corporate boundaries. Other players provide data unification environments and continuous data integration environments.
Also relevant to the space, Aslett says, are tools for database release automation and those specifically focused on automated provisioning of the data.
Build the case for DataOps
DataOps is a methodology, a practice, and a discipline, and as such, you should evangelize its principles. As with any other corporate IT endeavor, you need buy-in from participants.
“What’s in it for me?” You’ve probably heard that question from your counterparts before, and you should be prepared to answer it now.
Sure, you can talk generally about how adopting DataOps improves efficiency, but is that inspiring to people? Not really, says Bergh. “Focus on projects which improve the top line and grow the company," he says. "These projects energize the company and gain a high level of visibility for the analytics team.”
Shining a light on machine learning is a good idea. It’s hot for so many applications and has such possibilities, including better online content curation and discovery and more personalized interactions with customers. So perhaps you can start with the proposition that DataOps contributes to successful machine learning outcomes by beating down data silos and securing and providing access to high-quality, readily available training datasets. Vendors are already onto this opportunity; a growing segment of them are focused on machine learning development and deployment use cases, as are more established data science and machine learning model-operationalization providers, Aslett notes.
There are more good use cases for your evangelism. “The positive impact of the DataOps approach is remarkable in fraud detection, especially in banking,” says Bekker. Tools related to DataOps allow banks to replace the traditional rule-based approach to fraud detection for clients’ credit cards as well as significantly increase the speed of data analyzing. “Instead of using the predefined set of rules, the DataOps approach implies gathering real-time data—for example, customers’ transaction details or purchasing habits—and generating insights about clients’ behavior.” Because of these advanced insights, banks can avoid having to call or text their clients to verify transactions.
If the DataOps picture is starting to come into focus for you, there’s no time like the present to start the journey to domesticate your company’s data world—much as lawman Earp headed west to help tame what was then a violent cowboy culture. Getting your organization to subscribe to an agile and iterative approach is a big ticket to becoming a truly data-driven enterprise.
Adopting DataOps: Lessons for leaders
- Apply lessons from agile, DevOps, and lean manufacturing principles to DataOps.
- Spearhead a campaign to align data analysts, engineers, and scientists to conduct the development process in parallel, according to the functions to which they are best suited.
- Standardize the processes involved in building data pipelines, and potentially use technology solutions to ensure they stay on track.
This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.