For several years now, the elephant in the room has been that data and analytics projects are failing. Gartner estimated that 85% of big data projects fail. Data from New Vantage partners showed that the number of data-driven organizations has actually declined to 24% from 37% several years ago and that only 29% of organizations are achieving transformational outcomes from their data.
In addition, only one-third of companies have an established CDO role, and the average tenure of the CDO is only 2.5 years. Add all these facts together, and it paints a picture that something is amiss in the data world.
Yet, among all this, one area that hasn’t been studied is the data engineering role. We thought it would be interesting to look at how data engineers are doing under these circumstances. Are they thriving or feeling the impact of failed projects? We surveyed 600 data engineers, including 100 managers, to understand how they are faring and feeling about the work that they are doing. The top-line result was that 97% of data engineers are feeling burnout.
If you have worked in the big data industry, you will likely resonate with the survey participants. Data engineering resembles software engineering in certain respects, but data engineers have not adopted the best practices that software engineering has been perfecting for decades. When there is a problem in data pipelines, data engineers are expected to fix it using ad hoc processes that simply will not scale.
Data engineering burnout stems from never succeeding in getting systems under control. There’s an unending wave of problems: customer requests, broken systems, and errors. One problem is that data engineers are seen as a cost minimization role instead of a generator of value. People think of them as firefighters, and when there is an outage, they should be working day and night to address it. Data engineers end up fixing the same problem over and over.
Enterprises must empower data engineers to fix processes instead of just bugs. Imagine a data pipeline error or data problem that impacts critical analytics. Most organizations find out about these errors from their customers, such as a VP of Sales who notices that the bookings report is millions of dollars off. This oversight triggers an all-hands-on-deck emergency response. If data pipelines span teams, then there is an unpleasant (and often political) discovery phase where people may point fingers at each other. Under tremendous pressure and scrutiny, the data team works the weekend to rush a fix into the production pipeline. When the problem is addressed, everyone heaves a sigh of relief. The heroes who fixed the problem are sometimes praised, or perhaps they are scorned for letting the problem occur in the first place.
One widespread problem is a reliance on the culture of heroism. People jump whenever there is a problem, but heroism is not a strategy. It doesn’t scale. It leads to burnout and turnover. When people leave the company, they take tribal knowledge with them, so burnout has a tangible business impact. In our survey, data engineers cited the following as causes of burnout:
- The relentless flow of errors
- Manual processes crowd out innovation
- Steady Stream of Half-Baked Requests
- Blaming and finger-pointing
- Restrictive data governance Policies
For see the entire results of the data engineering survey, please visit “2021 Data Engineering Survey: Burned-Out Data Engineers are Calling for DataOps.”
Methods to Avoid Burnout
The data engineers in our survey have many challenges. They wake up the day after an analytics deployment and worry about the little things they didn’t check. They get yelled at for missing milestones. Users roll their eyes when the analytics team can’t turn on a dime. Everyone in the industry faces these problems. The important thing to realize is that these problems are not the fault of the people working in the data organization. They are process problems. The data analytics lifecycle is a factory, and like other factories, it can be optimized with techniques borrowed from methods like lean manufacturing.
Instead of directing negativity at the enterprise or others or oneself, data engineers can use workplace challenges as fuel to transform suffering into system building. The problems of burnout and lack of data project success are not an indication of personal failure or that the field is broken. It points to work that needs to be done to build systems around automation and data toolchains. To this end, we offer ten tips to avoid data engineering burnout.
1. Don’t suffer; build a system
Many data engineers view their job as servicing a queue of tasks. The faster they can take tasks off the queue and check them off, the more they feel they have accomplished. When faced with a task, many data engineers try to address it, thinking only about the task as defined. If the problem is a bug, you fix the bug and move on.
What if that same problem recurs over and over? Take a broad, holistic view of the situation. Don’t just fix the bug. Improve the system so that the problem does not recur ever again. Instead of just answering an analytics question, think about how to answer the following ten similar questions. Instead of just solving a problem manually, build a robot that will solve the problem for you every time it occurs in the future.
Write tests that catch data errors. Automate manual processes. Implement DataOps methods. Build observability and transparency into your end-to-end data pipelines. Automate governance with governance-as-code. Instead of fixing bugs and creating analytics, design the system that enables rapid design of robust, observable and governed analytics. Build a system that eliminates the causes of data engineering suffering.
The critical question that data engineers need to ask is whether their customers (users, colleagues, consumers) are getting value out of data. Forget how hard you work. Consider how to shrink the task queue through automation and expend the least possible manual effort to transform data to value for customers.
2. Don’t agonize – automate that sh**
Leaders of data teams need to make time for the team to automate tasks. Celebrate automation as an activity that generates tremendous value. This transformation has already taken place in software engineering.
Twenty years ago, a software dev team could have one release engineer for a 35 person dev team. In those days, the release engineer was seen as a lesser role and compensated less than the developers. Today the role of release engineer is called DevOps engineer, and most organizations devote 25-28% of their staff to automating software development processes. Why? Because automation improves velocity and generates value. Data teams need to follow the lead of software engineering teams.
When we say automation, we don’t mean finding ways for the pair of hands on the keyboard to generate SQL faster. The world is full of languages and tools, and each has its strengths and weaknesses. Automation is the system around the tools. The system creates on-demand development environments, performs automated impact reviews, tests/validates new analytics, deploys with a click, automates orchestrations, and monitors data pipelines 24×7 for errors and drift. These forms of automation will improve analytics productivity and quality by 10x.
We talk about systemic change, and it certainly helps to have the support of management, but data engineers should not underestimate the power of the keyboard. The person sitting in front of the keyboard with the technical skills to create data integrations, transformations and analytics can make change one test and one automated orchestration at a time. It is just as heroic to develop a robust automated pipeline as it is to work the weekend fixing a broken piece of analytics. Data engineers should feel empowered to reimagine the environment in which they work with robust systems and designs that scale.
3. Run toward errors
Errors are embarrassing. They make us look bad. A person that is blamed for errors might lose their position of respect or authority in an organization. People instinctively turn away from mistakes.
Data has errors, and it will always have errors. Don’t hide them. Don’t forget about them. Errors are opportunities to automate. Errors are opportunities for improvement. Run towards errors. Embrace your errors and use them to build a system that prevents the same errors from ever recurring.
4. Don’t be afraid to make a change
One of the cardinal sins of data engineering is overcaution. In an environment with a continuous flow of errors, where shame and blame are the norms, data engineers can be scared to deploy code or architecture changes. One of the common responses to fear is to proceed more slowly and carefully. To avoid errors and outages, fearful engineers give each analytics project a more extended development and test schedule. Effectively, this is a decision to deliver higher quality but fewer features to users. Overcaution is a double-edged sword. The slow and methodical approach often makes users unhappy because analytics are delivered more slowly than their stated delivery requirements. As requests pile up, the data analytics team risks being viewed as bureaucratic and unresponsive.
Data engineers do not have to choose between agility and quality. An automated system can test and deploy new analytics in a staging environment aligned with production. If data pipelines have thousands of tests verifying quality, the data team can be sure that pipelines are operating correctly and that data and analytics are sound. Automation and observability enable you to proceed with confidence. With the proper methods and systems in place, you don’t have to fear change.
5. ‘Done’ means live in your customer’s hands
When is a project ready to be pushed to production? When can you declare it done?
When schedules are tight, it’s tempting to address a task as narrowly as possible, throw it over the wall and declare victory. If there are side effects, it is someone else’s problem.
Data engineers better serve their mission when they define “done” in terms of creating value for customers. Focusing on data as a whole product, instead of narrowly defining projects, helps to tear down the walls between groups within the enterprise. Viewing data as a product puts the customer’s success at the top of the priority list.
In concrete terms, a project is done when a system has been built around it. You can orchestrate tools, team environments and processes in one single pipeline. Code is version controlled and parameterized for reuse across different environments. Data pipelines have enough automated tests to catch errors, and error events are tied to end-to-end observability frameworks. Data engineers would do well to take a holistic view of projects and think about how changes fit into the overall system.
6. Don’t be a hero; make heroism a rare event
Sometimes heroism saves the day. Sometimes circumstances force you to put in that long night, do the thing that doesn’t scale, and get the job done for the customer. The trick is to circle back the next day and work on the robust fix that prevents that problem from recurring. With discipline, a team can slowly wean off its dependency on heroism. Heroism doesn’t ever go to zero, but if it happens all the time, the team will be exhausted and burned out. Exhausted engineers tend to change jobs, leaving the rest of the data team holding the bag.
7. Seek out opportunities for reuse and sharing
The work that engineers perform in data and analytics is code. A sure way to create more work is to copy someone else’s code, tweak it a little bit and create a variant that has to be supported independently of the original code. Organizations that operate in this way are soon buried under the crush of technical debt.
Data engineers are in the “complexity” business, so anything that reduces complexity is a “win.” If data teams can build reusable components, they gain scale by sharing those components across numerous pipelines. Data teams have large distributed systems, numerous tools and many data sources. Abstracting and grouping related features into functional units, making them reusable and giving them APIs (application programming interfaces), helps create leverage that scales while minimizing technical debt. Design scalability and reuse are essential features that data engineers shouldn’t dismiss in pursuit of delivering data value for our customers.
8. Practice DataGovOps
Data governance is a necessity, not a luxury. Data governance needs to control critical data, but governance needs to serve the higher mission of helping create value from data. DataGovOps applies DataOps principles to governance in the same way that Agile and DevOps apply to product development.
Organizations need to be agile in defining who accesses data, what kind of access is appropriate and how to utilize data to maximize its value. At the same time, data governance can’t compromise on its mission to prevent data from being used in unethical ways, contrary to regulations or misaligned strategically.
DataGovOps seeks to strike the right balance between centralized control and decentralized freedom. When governance is enforced through manual processes, policies and enforcement interfere with freedom and creativity. With DataOps automation, control and creativity can coexist. DataGovOps uniquely addresses the DataOps needs of data governance teams who strive to implement robust governance without creating innovation-killing bureaucracy. If you are a governance professional, DataGovOps will not put you out of a job. Instead, you’ll focus on managing change in governance policies and implementing the automated systems that enforce, measure, and report governance. In other words, governance-as-code.
9. Measure your processes (and improve them)
Data engineers can apply their numerical and analytical skills to measure the organizations in which they work. Process metrics for data engineering and data science teams help an organization understand itself and how it can improve. If you start to measure activities, you may be surprised to discover the amount of time people spend in meetings, how many errors reach production, and how long it takes to make changes to analytics. These critical measurements of error rates, cycle times and productivity shed light on how people work as a team and how the data team creates value for customers.
10. Don’t put your head in the sand – focus on value delivery
DevOps has become so important in the software engineering space, yet some people still don’t understand the need for DataOps in the data space. There’s a mistaken notion out there that adding more robust automated processes will slow down data teams.
One way to think about DataOps is like the brakes on a car. Why does a car have brakes? Is it to go slowly? No, brakes help you go fast safely. The engineering processes and automation that we have described are intended to help data engineers go faster. They take a little more time to set up initially, but they save huge amounts of time downstream.
Forging a Path to Agile and DataOps
The ideas discussed above stem from methods developed 30-40 years ago in industrial factories where teams worked together on complex projects. These ideas evolved into the Toyota production system and Lean Manufacturing.
People encountered similar problems in how software is built, and the DevOps and Agile methodologies were born. Now we are conducting the same discussion in data and analytics. Data organizations have large teams that must work together amid an enormous amount of complexity.
We know from the auto industry that cars that once drove only 50,000 miles can now last 150,000 miles. Perhaps electric vehicles will be able to go 500,000 miles. These improvements are the result of quality methods that focus on cycle time and eliminating waste. The same thing can happen in the big data industry. We can go faster and farther using fewer resources. DataOps methods can help data organizations follow the path of continuous improvement forged by other industries and prevent data team burnout in the process.
For more information, DataKitchen has published the “DataOps Cookbook,” which helps guide you through everything you need to know about DataOps. For the survey results, see 2021 Data Engineering Survey Results.