Have you ever failed at work? Most people have done something cringeworthy at some point. I have plenty of funny stories to tell. There’s the time I set off the fire alarm on my first day of a new management job. Perhaps even more notable are the epic fails. I once worked at a company where it seems, on a near-daily basis, I would find myself on the phone with an angry customer. There was another data error. If it wasn’t fixed immediately, the customer would find a different vendor. My reputation and career were constantly on the line. Not fun. At first, I worried that the problem was me. I always thought of myself as a dependable person who gets the job done. I wasn’t used to coming up short week after week.
Over the decades, I have come to understand that good people underperform when operating within a bad system. If the data team can’t deliver quickly enough or suffers from abominable quality, the culprit is likely the methodologies that they are using. Some companies become overly dependent on a star employee who they drive to work long hours. When people work in teams, task coordination and the elimination of process inefficiencies is far more consistent and impactful than individual ability or heroism.
In the industrial design domain, attitudes towards failure have matured. The adage “fail faster,” attributed to David Kelley of IDEO, expresses how failure fuels innovation. Failing fast implies that it is about speed, but it is more generally a call to minimize the consequences and the cost of failure. Failing fast enables designers to feel freer to take big risks, leading to creativity and innovation that powers progress and growth.
Operations managers widely apply the axiom “fail fast” to lean manufacturing. As a product progresses through a manufacturing process, its cost-of-goods-sold increases. Every factory manager knows it is much less costly to screen out a faulty component prior to assembly than to invest in mid- or post-production rework.
Failing Faster in Data Analytics
When data professionals apply these same principles to data analytics, they can reap the rewards of lower costs and higher creativity that we see in industrial manufacturing. In the data industry, the application of these methodologies across the data lifecycle is called DataOps. In summary, DataOps reduces the cost and consequences of data errors and data-analytics bugs. You could say that “failing faster” is the common theme that unifies all aspects of DataOps.
Data Pipeline Errors
Data operations consist of a set of data sources that progress through a series of processing steps, for example, integration, cleaning, processing, transformation and publication (as charts, graphics and reports). 30% of respondents to our recent DataOps survey reported more than 11 errors per month. In a significant number of enterprises, data errors are regularly flowing into user analytics with potentially catastrophic results.
DataOps places tests at each stage of the data-operations pipeline. It checks and monitors data at its source before it enters the pipeline. Does it conform to business logic? Does it fall within statistical norms? DataOps also tests inputs and outputs at each stage of transformation. If a fork or join fails, DataOps will catch it before it corrupts analytics.
DataOps implements automated statistical and process controls on data operations, much like a manufacturing plant. If data flowing through a multi-stage data-analytics pipeline violates business logic or statistical norms, then tests alert the data team or, in an extreme case, stop the flow of data.
In many organizations, new analytics are developed directly on operational systems. DataOps allows data analysts to create development sandboxes. With virtualization technology, sandboxes closely match the target production environment minimizing unexpected regressions. Sandboxes inherit analytics components and automated orchestrations along with associated tests so data scientists can better leverage each other’s work. If a data scientist takes a development risk, they can abandon a sandbox and revert to the baseline analytics code and configuration.
DataOps employs continuous integration methods like those that enable leading software organizations to deploy millions of code releases per year. Development sandboxes are isolated from production unless they progress through an automated release workflow that includes integration, functional, unit and other tests. Tests catch issues before analytics migrate into data operations.
Product Management Errors
Developing analytics that no one wants or needs is a costly product management failure. When DataOps teams implement Agile Development, they create short-term value by iterating rapidly. It’s much easier to be correct about what feature you need this week as opposed to 24 months from now. Also, Agile teams receive immediate feedback on what they have produced so they can course correct. Often, users don’t know what they want until they see it. Teams can be much more innovative when they create a rough, approximate solution and keep iterating on it.
Failing Your Way to Success
DataOps focuses on the identification and elimination of data pipeline, development, integration and product management errors. When DataOps minimizes the cost and consequences of errors, data analysts are free to work more closely with business users. Together they can play with ideas, try new things, and act on hunches. When this process plays out, it unlocks tremendous creativity. With failures minimized and identified early, DataOps enables enterprises to deliver on the promise of leveraging data for competitive advantage. We have seen many companies use DataOps methods to vault forward, taking a leadership place in the market. Fail faster using DataOps.