DataOps Reports that Keep Your Finger on the Pulse

Itโ€™s never good when your boss calls at 5 pm on a Friday. โ€œThe weekly analytics didnโ€™t build correctly. What happened? Call me every hour with updates until you figure it out!โ€

For many data professionals, this situation is all too familiar. Analytics, in the modern enterprise, span toolchains, teams, and data centers. Large enterprises ingest data from dozens or hundreds of internal and external data sources. An error could occur in any of the millions of steps in the data pipeline.

DataOps uses automation to create unprecedented visibility into data operations. Every operation, every step, and every transaction in the data pipeline is tested and verified. Warnings and failures appear in logs and reports that can help the data team pinpoint problems with laser-like accuracy. Below weโ€™ll show an actual report used by a DataOps enterprise. It enables all of the stakeholders to view the real-time status of analytics build operations. It helps people keep their โ€œfinger on the pulseโ€ of what is happening, so stakeholders started calling it the โ€œPulse Report.โ€

The Pulse Report

The Pulse Report is a set of workflows and reports that provide detailed, up-to-the-minute build status. Imagine that data are collected from data sources over SFTP, email, and S3-to-S3 transfer and deposited in an S3 repository. Any one of these sources could have a problem. A report called โ€œUpcoming Data Sourcesโ€ shows the arrival status of each data source.

If critical data is present, the automated orchestration executes the builds. A report titled โ€œDaily Build Summaryโ€ presents the status of analytics builds that instantiate data and analytics for users, such as, for example, data mart schemas employed by users for self-service analytics. The โ€œCurrent Data Sourcesโ€ report enables users to see the timestamp (or other recency information) of data sources to understand analytics better. The reports mentioned are further explained below.ย 

Upcoming Data Sources Report

The Upcoming Data Sources Report tracks all data sources received and collected for inclusion in future builds. Data files arrive on their own schedule โ€“ some hourly, some weekly, or perhaps in the middle of the night. The Upcoming Data Sources Report monitors each data source with respect to its schedule and reports its status. This report gives the data team an easy way to check whether a build is on track or at risk.

The report shows the status of each data source:

  • OK – received on time
  • Stale – new file expected but not yet late โ€“ the data team pays attention to these to avoid missing builds
  • Late – new file not present after a specific target delivery time
  • Received Late – file received after scheduled build start-time

The proactive data team doesnโ€™t wait for a data source to be late before taking action. They monitor the data sources and follow up on those that are โ€œstale.โ€ With large numbers of data sources, automation can monitor status and promptly deliver stale or late alerts to the data team.

Upcoming Data Sources

The Upcoming Data Sources Report provides the data team with status of incoming files required for pending builds.

Daily Build Summary Report

The Daily Build Summary shows the status of each daily build. Builds or โ€œOrder Runsโ€ are automated data-flows that run as scheduled, triggered, or manual processes, called โ€œRecipe Variationsโ€ on the DataKitchen Platform. The Daily Build Summary includes the current status of builds run in the production system; specifically, production Kitchens.ย 

When the system wants to start a build, the key question is whether it has all the component files ready to go. If a file is missing, the build orchestration script updates the Pulse Report and, if so configured, alerts the data team.

Build Types are:

  • Analytic – analytic products like star schemas and data marts for end-users
  • Process – smaller processes geared to a specific task or deliverable
  • Source – ingestion of source data used by other builds
  • Third-Party – ingestion of third-party data sources involving little additional manipulation
  • Utility – processes used for data management and oversight

Each build listed shows a build status as well as the results of the prior build:

  • Completed – Order Run completed with a note of โ€œlate startโ€ if applicable
  • Upcoming – Scheduled build time approaching
  • Prior Build Error – Order Run failed to complete

For scheduled and triggered Order Runs, the report shows the schedule or condition which activates the build. It also shows the Orderโ€™s planned and actual start and end times, determining whether an Order started or completed late. Automated processes update the Daily Build Summary at frequent intervals throughout the day.

Some data sources are critical and must stop a build. For example, an updated production status could be in that category for certain applications. Other data sources are less important โ€“ a build can proceed without them. For example, the status of reverse logistics might not be critical in some instances. Use cases determine the criticality of each data source.

Daily Build Summary

The Daily Build Summary shows the status of each daily build.

When a build is complete, a table is added to each schema indicating the refresh date for each source file. These timestamps are available to users via the โ€œCurrent Data Sourcesโ€ report to understand the recency or โ€œfreshnessโ€ of the analytics reports.

If a file is missing, a build did not run, or a file is late, the system can notify the data team as appropriate. For example, the automated orchestration responsible for the build can report errors by creating a JIRA ticket or sending an email or slack message.

Overview of the Pulse Report and associated data.

Proactive Problem Solving with the Pulse Report

With DataOps and the Pulse Report, the data team can easily see which builds failed and which file arrivals affected the build. When a build does not run, the Pulse Report indicates whether the data set components were in place to support the build. If not, the data team would have known immediately, instead of finding out from users that something was wrong. As data set status flipped to stale, the data team might have been able to resolve an issue and keep the build on track proactively.

 

About the Author

Chip Bloche
Chip Bloche is a Data Engineering Director at DataKitchen. Chip joined DataKitchen as a DataOps chef in 2018 leading a team of DataOps Engineers. He has more than 30 years of experience in designing OLTP Database Applications, Systems Integration and Data Warehouse Design, and Data Engineering for Business Intelligence and Machine Learning applications.

Sign-Up for our Newsletter

Get the latest straight into your inbox

Open Source Data Observability Software

DataOps Observability: Monitor every Data Journey in an enterprise, from source to customer value, and find errors fast! [Open Source, Enterprise]

DataOps Data Quality TestGen: Simple, Fast Data Quality Test Generation and Execution. Trust, but verify your data! [Open Source, Enterprise]

DataOps Software

DataOps Automation: Orchestrate and automate your data toolchain to deliver insight with few errors and a high rate of change. [Enterprise]

recipes for dataops success

DataKitchen Consulting Services


Assessments

Identify obstacles to remove and opportunities to grow

DataOps Consulting, Coaching, and Transformation

Deliver faster and eliminate errors

DataOps Training

Educate, align, and mobilize

Commercial Data & Analytics Platform for Pharma

Get trusted data and fast changes to create a single source of truth

 

dataops-cookbook-download

DataOps Learning and Background Resources


DataOps Journey FAQ
DataOps Observability basics
Data Journey Manifesto
Why it matters!
DataOps FAQ
All the basics of DataOps
DataOps 101 Training
Get certified in DataOps
Maturity Model Assessment
Assess your DataOps Readiness
DataOps Manifesto
Thirty thousand signatures can't be wrong!

 

DataKitchen Basics


About DataKitchen

All the basics on DataKitchen

DataKitchen Team

Who we are; Why we are the DataOps experts

Careers

Come join us!

Contact

How to connect with DataKitchen

 

DataKitchen News


Newsroom

Hear the latest from DataKitchen

Events

See DataKitchen live!

Partners

See how partners are using our Products

 

Monitor every Data Journey in an enterprise, from source to customer value, in development and production.

Simple, Fast Data Quality Test Generation and Execution. Your Data Journey starts with verifying that you can trust your data.

Orchestrate and automate your data toolchain to deliver insight with few errors and a high rate of change.

Data Leaders Brief
Testing Data Analytics Data Architecture Business Analytics Customer Analytics Moreย >>