- Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows.
- Airflow’s extensible Python framework enables you to build workflows connecting with virtually any technology.
- A web interface helps manage the state of your workflows.
- Airflow is deployable in many ways, varying from a single process on your laptop to a distributed setup to support even the biggest workflows.
What is Airflow™? — Airflow Documentation
Workflows as code
- The main characteristic of Airflow workflows is that all workflows are defined in Python code.
- “Workflows as code” serves several purposes:
- Dynamic: Airflow pipelines are configured as Python code, allowing for dynamic pipeline generation.
- Extensible: The Airflow™ framework contains operators to connect with numerous technologies. All Airflow components are extensible to adjust to your environment easily.
- Flexible: Workflow parameterization is built-in, leveraging the Jinja templating engine.
Why Airflow?
- Airflow is a batch workflow orchestration platform.
- The Airflow framework contains operators to connect with many technologies and is easily extensible to connect with a new technology.
- If your workflows have a clear start and end and run at regular intervals, they can be programmed as an Airflow DAG.
- If you prefer coding over clicking, Airflow is the tool for you.
- Workflows are defined as Python code, which means:
- Workflows can be stored in version control so that you can roll back to previous versions
- Workflows can be developed by multiple people simultaneously
- Tests can be written to validate functionality
- Components are extensible, and you can build on many existing components.
- Rich scheduling and execution semantics enable you to easily define complex pipelines running at regular intervals.
- Backfilling allows you to (re-)run pipelines on historical data after changing your logic.
- The ability to rerun partial pipelines after resolving an error helps maximize efficiency.
- From the interface, you can inspect logs and manage tasks, for example, retrying a task in case of failure.
- The open-source nature of Airflow ensures you work on components developed, tested, and used by many other companies.
- You can find many helpful resources in the active community, like blog posts, articles, conferences, books, etc.
- You can connect with other peers via several channels, such as Slack and mailing lists.
Why not Airflow?
- Airflow was built for finite batch workflows.
- While the CLI and REST API allows triggering workflows, Airflow was not built for infinitely running event-based workflows.
- Airflow is not a streaming solution.
- However, a streaming system like Apache Kafka often works with Apache Airflow.
- Kafka can be used for ingestion and processing in real-time, event data is written to a storage location, and Airflow periodically starts a workflow processing a batch of data.
- If you prefer clicking over coding, there are better solutions than Airflow.
- The web interface aims to make managing workflows as easy as possible, and the Airflow framework is continuously improved to make the developer experience as smooth as possible.
- However, the philosophy of Airflow is to define workflows as code, so coding will always be required.