


Similarly, to create your visualizations it may be possible that you need to load data from multiple sources. The analogy also shows that certain steps like kneading the dough and preparing the sauce can be performed in parallel as they are not interdependent. Similarly, to create your visualization from the past day’s sales, you need to move your data from relational databases to a data warehouse. Similarly, for Pizza sauce, you need its ingredients. Like, to knead the dough, you need flour, oil, yeast, and water. Now, the DAG shows how each step is dependent on several other steps that need to be performed first. Workflows usually have an end goal like creating visualizations for sales numbers of the last day. Let’s use a pizza-making example to understand what a workflow/DAG is. In Airflow, these workflows are represented as Directed Acyclic Graphs (DAG). These data pipelines deliver data sets that are ready for consumption either by business intelligence applications and data science or machine learning models that support big data applications. Orchestration of data pipelines refers to the sequencing, coordination, scheduling, and managing of complex data pipelines from diverse sources. Apache Airflow is used for the scheduling and orchestration of data pipelines or workflows.
