Big Data
Cloud Composer
Managed Apache Airflow — workflow orchestration
AWS equivalent
MWAA (Managed Workflows for Apache Airflow)
AWS → GCP: Key Differences
- ▸
Both are managed Airflow. Cloud Composer 2 is significantly faster and more cost-efficient than Composer 1.
- ▸
Composer runs on GKE under the hood — auto-scales workers based on DAG load.
Key Concepts to Know
- 1
Apache Airflow: define workflows as Directed Acyclic Graphs (DAGs) in Python.
- 2
Schedule and orchestrate: BigQuery queries, Dataflow jobs, Dataproc clusters, any GCP operation.
- 3
Pre-built operators for GCP services: BigQueryInsertJobOperator, DataflowTemplatedJobStartOperator, etc.
- 4
Composer 2: uses GKE Autopilot, auto-scales, significantly cheaper than Composer 1.
DCE Interview Tips
- ★
Use Composer when: multiple dependent pipeline steps (run Dataflow → wait → run BigQuery query → trigger report).
- ★
'Cloud Composer orchestrates your data pipeline steps. It's like a conductor coordinating Pub/Sub, Dataflow, BigQuery, and Looker in sequence.'
Common Gotchas
- !
Composer is expensive for simple use cases. Cloud Scheduler + Cloud Run/Functions is cheaper for simple scheduled jobs.
- !
Composer 1 is being deprecated — recommend Composer 2 for all new deployments.