Plumbing Wisdom for Data Pipelines

While admiring the latest cloud tech, don't forget that humans have been debugging pipelines since the Romans built the aqueducts. Any good plumber can give you some hard-won tips on managing data pipelines effectively, insights that might save your career from going down the drain.

While you’re admiring the latest cloud tech, don’t forget that humans have been debugging pipelines, at least since the Romans built the aqueducts.  Any good plumber can give you some hard-won tips on managing data pipelines effectively, insights that might save your career from going down the drain.

 

Your overalls are cleaner when you work on new construction

Green fields smell sweet, and a nice long design phase can give you a rosy view of your project.  The real world is a bit more pungent.  That’s why a key goal of DataOps is to expose your best-laid plans to real-life urgencies faster and initiate that crucial feedback loop sooner.  With DataOps, you define a framework and a set of best practices to deal with real-world changes and crises.  It’s about tapping into that feedback loop earlier to keep your process reliable — and relevant.

It’s not done until you’ve run the water through it.

Good testing, like exercise and veganism, is the subject of fervent talk and half-hearted action.  There are lots of reasons good people test inadequately.  Deadlines are high-profile, and details go unnoticed.  The stakes may be high, but the reckoning seems distant. It’s possible, and often irresistible, to move on to the next emergency before the last ingenious solution has been proven out.  But water is inexorable, and plumbers quickly face the consequences of their mistakes.  Testing is intrinsic to the job.  

So you’ve checked that it works — are you done?  Of course not.  Plumbers don’t just check the water level; they may install a low-water cutoff switch or a relief valve so the system responds to changing inputs.  They might install a leak detector to monitor for problems automatically, and send alerts, while the house is unattended.  By automating your tests, then running them with each refresh, you build in safety valves for your data pipeline. 

 

Don’t over-tighten your fittings

Some lessons you learn the hard way.  Materials expand over time, and today’s perfect fit can become tomorrow’s stuck coupling or busted pipe.  In data engineering, too, systems fail when tolerances are too tight.  Your schedule between the end of one process and the kick-off of another needs to account for delays that inevitably crop up.  Your data types must strike a balance between tight checks and resilient operations.  Every process has dimensions of variability you can’t control.  Expect perfection, and you’ll guarantee it won’t happen.

 

Don’t hang junk off your pipes.

Every plumber has a story about the folks who found it convenient to hang their wardrobe from the water pipes — until they flooded the basement.  Data Engineers take similar risks when they over-complicate their code or tack on extraneous processes at the wrong points in a critical path build.   Organize your data flows by function and criticality. You want a simple structure so people can immediately jump in and orient themselves. Your goal isn’t to intimidate people or squeeze the most complexity into the smallest space.  Even if it looks simple now, it’s going to get more complicated over time — don’t create an unwieldy mess.

 

Read the plans, but believe the piping.

When maintaining a home, blueprints and schematics can give you lots of information, some of which are even accurate.  Plumbing “as-built” can vary greatly from “as-designed,” and the only reliable way to know what you’re working with is to go to the pipes.  As data engineers, we strive to divide our grumbling equally between the awful documentation we must decipher and the hassle of documentation we have to produce.  It is even easier to skimp on documentation than testing. The gap between specs and reality starts to grow from the day the specs are defined. 

There are two solutions to this.  One is to set clear rules that documentation must be thorough and up-to-date. These rules will be carefully filed away with your documentation. The alternative is to acknowledge that the primary source of your docs will always be your data structures and code.  The strategy: lower the friction to document your team’s work. If you encourage your team to comment consistently, you can rely on tools to automate daily refreshes of your knowledge base for everybody else. Details will be more up-to-date.  Gaps will become readily apparent. Your documentation efforts, like your business, will be data-driven.

 

Tell the residents before you shut off the water on Saturday night. 

Water and mediocrity flow downhill. Good plumbers realize how much people depend on their work. You don’t have to tell a plumber that sh*t happens. But transparency and good communication will get you out of many jams. People appreciate knowing what’s happening and how it might impact their work. Once you realize that part of your work is managing downstream expectations, you’ll get more of the credit you deserve.

 

Don’t bite your nails on the job.

Data engineers are human, and human frailties are part of the systems you build. Any pipeline that expects people to get it right every time is not just a source of stress – it’s structurally deficient. If you build out systemic solutions to human error, you will have better pipelines and a more engaged, effective team.

There’s a different issue here for plumbers. I won’t get into it. Either way, if you follow the rules above, you will be flushed with success!

Sign-Up for our Newsletter

Get the latest straight into your inbox

Data Observability Software

DataOps Observability: Monitor every Data Journey in an enterprise, from source to customer value, and find errors fast! [Enterprise]

DataOps TestGen: Simple, Fast Data Quality Test Generation and Execution. Trust, but verify your data! [Enterprise]

DataOps Software

DataOps Automation: Orchestrate and automate your data toolchain to deliver insight with few errors and a high rate of change. [Enterprise]

recipes for dataops success

DataKitchen Consulting Services


Assessments

Identify obstacles to remove and opportunities to grow

DataOps Consulting, Coaching, and Transformation

Deliver faster and eliminate errors

DataOps Training

Educate, align, and mobilize

Commercial Pharma Agile Data Warehouse

Get trusted data and fast changes from your warehouse

 

dataops-cookbook-download

DataOps Learning and Background Resources


DataOps Journey FAQ
DataOps Observability basics
Data Journey Manifesto
Why it matters!
DataOps FAQ
All the basics of DataOps
DataOps 101 Training
Get certified in DataOps
Maturity Model Assessment
Assess your DataOps Readiness
DataOps Manifesto
Thirty thousand signatures can't be wrong!

 

DataKitchen Basics


About DataKitchen

All the basics on DataKitchen

DataKitchen Team

Who we are; Why we are the DataOps experts

Careers

Come join us!

Contact

How to connect with DataKitchen

 

DataKitchen News


Newsroom

Hear the latest from DataKitchen

Events

See DataKitchen live!

Partners

See how partners are using our Products

 

Monitor every Data Journey in an enterprise, from source to customer value, in development and production.

Simple, Fast Data Quality Test Generation and Execution. Your Data Journey starts with verifying that you can trust your data.

Orchestrate and automate your data toolchain to deliver insight with few errors and a high rate of change.