Apache Airflow 2.0 with Docker on Windows 10 | WSL2
This is my Apache Airflow Local development setup on Windows 10 WSL2 using docker-compose. It will also include some sample DAGs and workflows.
Recent Updates:
- Updated image to Airflow 2.1.1
- Leveraging _PIP_ADDITIONAL_REQUIREMENTS to install additional dependencies
- Developing and testing operators for Treasure Data
- Read more at Treasure Data
📝 Table of Contents
- About
- Data Engineering Projects
- Data Visualization
- Getting Started
- Usage
- Running the tests
- Github Workflow
- Built Using
- Authors
- Acknowledgments
- Cleanup
🧐 About
Setup Apache Airflow 2.0 locally on Windows 10 (WSL2) via Docker Compose. The oiginal docker-compose.yaml file was taken from the official github repo.
This contains service definitions for
- airflow-scheduler
- airflow-webserver
- airflow-worker
- airflow-init - To initialize db and create user
- flower
- redis
- postgres - This is backend for airflow. I am also creating additional database
userdataas a backend for my data flow. This is not recommended. Its ideal to have separate databases for airflow and your data.
I have added additional command to add a airflow db connection as part of the docker-compose
Directories I am mounting:
- ./dags
- ./logs
- ./plugins
- ./sql - for Sql files. We can leveraje jinja templating in our queries. Refer the sample Dag.
- ./test - Has Unit tests for Airflow Dags.
- ./pg-init-scripts - This has scripts to create additional database in postgres.
Data Engineering Projects
Here you will find some personal projects that I have worked on. These projects will throw light on some of the airflow features I have used and learnings related to other technologies.
- Project 1 -> Get Covid testing data
Data Visualization
To experiment with Apache Superset. Read more here
🏁 Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Clone this repo to your machine
docker-compose -f docker-compose.yaml up airflow-init
docker-compose -f docker-compose.yaml up
Prerequisites
What things you need to install the software and how to install them.
You should have Docker and Docker-compose v1.27.0 or more installed on your machine
- Install and configure WSL2
- I also had to reset my Ubuntu installation and thats when it asked me to create a user.
Installing
A step by step series of examples that tell you how to get a development env running.
Clone the Repo
git clone
Start docker build
docker-compose -f docker-compose.yaml up airflow-init
docker-compose -f docker-compose.yaml up
Keep checking docker processes to make sure all machines are helthy
docker ps
Once you notice that all containers are healthy. Access Airflow UI
http://localhost:8080
End with an example of getting some data out of the system or using it for a little demo.
🔧 Running the tests
Unit test for airflow dags has been defined and present in the test folder. This folder is also mapped to the docker containers inside the docker-compose.yaml file.
Follow below steps to execute unittests after the docker containers are running:
./airflow bash
python -m unittest discover -v
Github Workflow for running tests
I had to create another docker-compose to be able to execute unit tests whenever I push code to master. Please refer
Break down into end to end tests
Another #TODO
🎈 Usage
Now you can create new dags and place them in your local system and can see it coming live on web UI. Refer the sample dag in the repo.
Important :
Edit the postgres_default connection from the UI or through command line if you want to persist data in postgres as part of the dags you create. Even better you can always add a new connection.
Update: This is now taken care of the in the updated Docker compose file. The connection and the new database are created
./airflow.sh bash
airflow connections add 'postgres_new' --conn-uri 'postgres://airflow:airflow@postgres:5432/airflow'
connect to postgres and create new database with name 'userdata'
docker exec -it airflowdocker_postgres_1 /bin/bash psql -U airflow create database userdata;
Turn on Dag: PostgreOperatorTest_Dag
⛏️ Built Using
- Postgres - Database
- Redis
- Apache Airflow
- Docker - build Tool
- Apache Superset - For Data visualization
✍️ Authors
- The Airflow community
- @anilkulkarni87
🎉 Acknowledgements
- Apache Airflow
- Inspiration is the Airflow Community
Cleanup
docker-compose down --volumes --rmi all
