Breaking Through the Noise
A Guide to Understanding DataOps Solutions
DataOps is the hot topic on every data professional’s lips these days, and we expect to hear much more about DataOps in 2020. This is not surprising given that DataOps holds true potential for enabling enterprise data teams to generate significant business value from their data. Companies that implement DataOps find that they are able to reduce cycle times from weeks (or months) to one day, virtually eliminate data errors, and dramatically improve the productivity of data engineers and analysts.
As a result, vendors that market DataOps capabilities have grown in pace with the popularity of the practice. To date, we count over 70 companies in the DataOps ecosystem. However, the rush to rebrand existing products as related to DataOps has created some marketplace confusion. Because it is such a new category, both overly narrow and overly broad definitions of DataOps abound. As a result, it is easy to get overwhelmed when trying to evaluate different solutions and determine whether they will help you achieve your DataOps goals.
So, What is DataOps Anyway?
In short, DataOps is a set of technical practices, cultural norms, and architectures that enable:
Rapid experimentation and innovation for the fastest delivery of new insights to customers;
Low error rates;
Collaboration across complex sets of people, technology, and environments;
Clear measurement and monitoring of results.
Similarly, Gartner defines DataOps as, “a collaborative data management practice focused on improving the communication, integration, and automation of data flows between data managers and data consumers across an organization.” Like its DevOps cousin, key elements of DataOps include increased deployment frequency, automated testing and monitoring, version control, and collaboration.
This sounds great and you are ready to get started, but the next big question is how can your organization best achieve this transformation? How can you sift through all the marketing speak and find the solutions that will truly help you?
Understanding DataOps Solutions
DataOps addresses a broad set of workflow processes, including analytics creation and your end-to-end data operations pipeline. In general, it’s not a single tool you can purchase and forget. Fundamentally, any DataOps solution should improve your ability to orchestrate data pipelines, automate testing and monitoring, and speed new feature deployment – while continuing to choose the right tool for the right part of the job.
To be certain, many companies that are marketing their products as DataOps solutions play a critical role in the ecosystem. However, it is important to understand exactly what role they play. If you purchase a fancy new ETL tool, will you suddenly realize all the benefits of DataOps? Probably not.
When evaluating DataOps solutions, consider the following ways that companies are marketing their capabilities.
The Data Toolchain – Many tools being marketed today as DataOps solutions are simply independent components of the data toolchain that collect, store, transform, visualize, and govern the data running through the pipeline. Although all of these technologies play an important role in the value pipeline, they do not ensure that each step in the data pipeline is executed and coordinated as a single, integrated, and accurate process or help people and teams better collaborate. Remember that a DataOps process automates the orchestration and testing of these tools across the pipeline. In fact, in a true DataOps environment, it does not matter which data tools you use. Your team can continue to use the ETL or analytics tools they like best or add new tools at any time. Typically, components of toolchain are being marketed as DataOps solutions in two different ways.
DataOps Rebranding – One of the reasons that the concept of DataOps has become so muddied is because some companies are rebranding the actual concept of DataOps to fit with what their product does. For example, DataOps has been rebranded as ETL (e.g., Hitachi Vantara, Attunity), streaming ETL (e.g., StreamSets, Lenses.io), or data virtualization (e.g., Delphix).
The Halo Effect – Because DataOps is a hot marketing term it is not surprising that many data companies are using this concept in their marketing to generate interest. The companies doing “halo effect” marketing are using the correct definition of DataOps. However, if you read closely, the message is generally that, “DataOps is great, but use our tool first.” Some examples of this type of marketing are IBM’s marketing of its Cloud Pak for Data, Trifacta for end-user data prep, and Qlik for data analytics.
Data Process Tools – Data process and automation tools are being correctly marketed as important components of a DataOps solution. You’ll need some combination of these tools if you decide to implement DataOps yourself. Many popular DevOps tools can also be used.
Orchestration of end-to-end multi-tool, multi-environment pipelines can be facilitated by tools like Apache Airflow or Saagie.
Automated Testing and Monitoring at every step in production and development pipelines is important to catch and address errors before they reach the business user. iCEDQ is a leading testing and monitoring platform.
Environment and Deployment technologies allow teams to spin-up self-service work environments and innovate without breaking production. New features can be deployed with the push of a button. There are a host of tools built for this purpose, including well-known open-source tools such as Git (version control), Docker (containerization), and Jenkins (CI/CD).
All-in-One DataOps Platforms – Building a DataOps environment is challenging and requires a true organizational transformation and commitment of time and resources. Even the best-equipped organizations can encounter obstacles trying to bring it all together. DataKitchen offers the first end-to-end platform that can serve as a foundation for your DataOps initiative. It seamlessly automates and manages workflows related to both data operations and new analytics development, using the tools you already have. In fact, the DataKitchen platform can interoperate with any of the data toolchain and process tools mentioned above. The platform fosters collaboration by providing a single view of the entire pipeline. Version control and environment management enable work to move seamlessly from person to person or team to team. The platform also provides useful metrics that show whether your DataOps initiative is adding value.
DataOps, when implemented correctly, holds exciting promise for data teams to be able to reclaim control of their data pipelines and deliver value instantly without errors. It is easy to get confused by all the marketing noise, but remember that DataOps, at its core, is a collaborative process that orchestrates data pipelines, automates testing and monitoring, and speeds new feature deployment. Whether you use an all-in-one tool like DataKitchen or build it yourself, the right combination of tools, processes, and people are critical to make DataOps a success.
To learn more about how a DataOps Platform can help your data organization develop analytics at lightning speed and eliminate errors, contact us at datakitchen.io.