CI/CD and Data Test Automation for Apache Airflow

Solve these issues with Airflow

As a Data Engineer it is your job to help your business partners drive growth and analytics play a key role. Do you find yourself thwarted by one or more of these challenges of analytics?

CI/CD and Data Test Automation for Apache Airflow Un-maintainable system

Deployment Issues

How to continuously deploy Airflow DAGs into production

CI/CD and Data Test Automation for Apache Airflow Data Errors

Data Errors

These creep in front of your customers and make you look bad.

CI/CD and Data Test Automation for Apache Airflow Error prone process

Error prone process

You find that there are too many steps that have to be done in the right order and it is hard to keep them streight.

CI/CD and Data Test Automation for Apache Airflow Too much work

Too much work

You cannot keep up with the requests from your business partners and they are getting frustrated.

CI/CD and Data Test Automation for Apache Airflow Great tools

Lack test automation

You need to automate your data testing in production

You have looked at or tried standard ETL tools, but you found their metadata driven approach has it limits and the tool's output cannot be branched and merged. Also, you have explored open source tools, but see they are not a complete solution.

How to succeed

DataKitchen provides you the solution you have been looking for. The following chart shows how DataKitchen compares to Airflow:

Feature Open source - AirFlow DataKitchen
Orchestration of steps x (or Airflow)
UI & Command Line x x
Produces mergeable code x x
Version Control x
Tests for data quality x
Tests for code quality x x
Environments to work in x
Full support for CI/CD x

You need to have the ability to move features from a dev environment to production with high velocity and safety -- to be more agile. You need to be able to quickly respond to request for more data integrations or more business logic quickly. You need to know when data goes bad so you can take corrective action and not have your customers call you and lose trust in you. You need to have your processes automated and quality integrated and not be locked into one tool centric approach. You need to be able to quickly iterate through the cycle shown below.

CI/CD and Data Test Automation for Apache Airflow Podcast

Learn More and Hear The Data Engineering Podcast with DataKitchen

Enter your email address to download and listen to the entire podcast: