GCP Study Hub

Big Data

Cloud Composer

Managed Apache Airflow — workflow orchestration

AWS equivalent

MWAA (Managed Workflows for Apache Airflow)

OrchestrationApache AirflowETL
🔄

AWS → GCP: Key Differences

  • Both are managed Airflow. Cloud Composer 2 is significantly faster and more cost-efficient than Composer 1.

  • Composer runs on GKE under the hood — auto-scales workers based on DAG load.

📌

Key Concepts to Know

  • 1

    Apache Airflow: define workflows as Directed Acyclic Graphs (DAGs) in Python.

  • 2

    Schedule and orchestrate: BigQuery queries, Dataflow jobs, Dataproc clusters, any GCP operation.

  • 3

    Pre-built operators for GCP services: BigQueryInsertJobOperator, DataflowTemplatedJobStartOperator, etc.

  • 4

    Composer 2: uses GKE Autopilot, auto-scales, significantly cheaper than Composer 1.

💡

DCE Interview Tips

  • Use Composer when: multiple dependent pipeline steps (run Dataflow → wait → run BigQuery query → trigger report).

  • 'Cloud Composer orchestrates your data pipeline steps. It's like a conductor coordinating Pub/Sub, Dataflow, BigQuery, and Looker in sequence.'

⚠️

Common Gotchas

  • !

    Composer is expensive for simple use cases. Cloud Scheduler + Cloud Run/Functions is cheaper for simple scheduled jobs.

  • !

    Composer 1 is being deprecated — recommend Composer 2 for all new deployments.