Remote working has revealed the inconsistency and fragility of workflow processes in many data organizations. The data teams share a common objective; to create analytics for the (internal or external) customer. Execution of this mission requires the contribution of several groups: data center/IT, data engineering, data science, data visualization, and data governance.
Each of the roles mentioned above views the world through a preferred set of tools:
- Data Center/IT – Servers, storage, software
- Data Science Workflow – Kubeflow, Python, R
- Data Engineering Workflow – Airflow, ETL
- Data Visualization, Preparation – Self-service tools sucha as Tableau, Alteryx
- Data Governance/Catalog (Metadata management) Workflow – Alation, Collibra, Wikis
The day-to-day existence of a data engineer working on a master data management (MDM) platform is quite different than a data analyst working in Tableau. Tools influence their optimal iteration cycle time, e.g., months/weeks/days. Tools determine their approach to solving problems. Tools affect their risk tolerance. In short, they view the world through the lens of the tools that they use. The disparate toolchains illustrate how each group resides in its own segregated silo without an ability to easily understand what other groups are doing.
In normal times, it’s tough for the different teams to communicate with each other. Face-to-face meetings help somewhat, but in this latest era of remote work, these are more difficult. Chance encounters by the water cooler are non-existent. The processes and workflows that depend on individuals with tribal knowledge huddling to solve problems are nearly impossible to execute through video conferences.
Enterprises need to examine their end-to-end data operations and analytics-creation workflow. Is it building up or tearing down the communication and relationships that are critical to your mission? Instead of allowing technology to be a barrier to teamwork, leading data organizations rely explicitly on automation of workflows to improve and facilitate communication and coordination between the groups. In other words, they restructure data analytics pipelines as services (or microservices) that create a robust, transparent, efficient, repeatable analytics process that unifies all workflows.
In the data analytics market, this endeavor is called DataOps. DataOps automates the workflow processes related to the creation, deployment, production, monitoring, and governance of data analytics. Automation coordinates tasks, eliminating reliance on tribal knowledge and ad hoc communication between members of the data organization. DataOps spans the end-to-end data lifecycle, including:
- Continuous deployment – automated QA and deployment of new analytics
- Self-service sandboxes – An on-demand, self-service sandbox is an environment that includes everything a data analyst or data scientist needs in order to create analytics. For example:
- Complete toolchain
- Standardized, reusable, analytics components
- Security vault providing access to tools
- Prepackaged datasets – clean, accurate, privacy and security-aware
- Role-based access control for a project team
- Integration with workflow management
- Orchestrated path to production – continuous deployment
- Kitchen – a workspace that integrates tools, services, and workflows
- Governance – tracking user activity concerning policies
- Observability – Testing inputs, outputs, and business logic at each stage of the data analytics pipeline. Tests catch potential errors and warnings before they are released, so the quality remains high. Test alerts immediately inform team members of errors. Dashboards show the status of tests across the data pipeline.
DataOps puts the entire data organization in a virtual space with structured workflows that enable analytics and data to be seamlessly handed from team to team. DataOps automation makes it much easier for remote teams to coordinate tasks because the end-to-end data lifecycle is encapsulated in robust, repeatable processes that unify the entire data organization. With DataOps, it doesn’t matter where you physically reside because the workflow orchestration integrates your work with other team members. DataOps provides the structure and support to enable data teams to work remotely and together, producing analytic insights that shed light on the enterprise’s most difficult challenges.