EuroPython 2017

Feeding data to AWS Redshift with Airflow

Speaker(s) Federico Marani
Sub Community: PyData

Airflow is a powerful system to schedule workflows and define them as a collection of interdependent scripts. It is the perfect companion to do extract/transform/load pipelines into data warehouses, such as Redshift.

This talk will introduce some of the basis of Airflow and some of the concepts that are data pipeline specific, like backfills, retries, etc. Then there will be some examples on how to integrate this, along with some lessons learned there.

At the end, there will be a part dedicated to Redshift, how to structure data there, how to do some basic transformation pre-loading, how to manage the schema using SQLAlchemy and Alembic.

in on Thursday 13 July at 10:30 See schedule

Do you have some questions on this talk?

New comment