Cloud Composer

Cloud Composer is a fully managed workflow orchestration service that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers. Built on the popular Apache Airflow open source project and operated using the Python programming language, Cloud Composer is free from lock-in and easy to use.

Managed Workflow Automation

Within the Cloud Composer set up by Stacktics, a library of connectors and multiple graphical representations are available through one-click deployment. Cloud Composer pipelines are organized as directed acyclic graphs (DAGs) to accommodate Python files and lower the entry barrier to authoring and scheduling workflows. An Airflow DAG contains a DAG definition, operators and operator relationships, which allows for the development, management and orchestration of workflows.

Overview of Cloud Composer Architecture by Stacktics

The Cloud Composer distributes the environment’s resources between a Google-managed tenant project and a customer project.

 

A customer project is a Google Cloud project where the composer environment is created.

 

A tenant project is a Google-managed tenant project. It provides a unified access control and an additional layer of data security for the Composer.

Environment Components

Environment components run either in the tenant or in the customer project of the environment.

  1. Environment’s cluster
    1. Environment’s cluster is a Google Kubernetes Engine cluster.
    2. Environment nodes are VMs in the environment’s cluster.
    3. Pods run on environment nodes. Pods run containers with other environment components, such as Airflow workers and schedulers.
  2. Airflow schedulers
    1. Airflow schedulers control the scheduling of DAG runs and individual tasks from DAGs. Schedulers distribute tasks to Airflow workers by using a Redis queue, which runs as an application in the environment’s cluster. Airflow schedulers run as Deployments in your environment’s cluster.
  3. Airflow workers
    1. Airflow workers execute individual tasks from DAGs by taking them from the Redis queue. Airflow workers run as Deployments in the environment’s cluster.
  4. Redis queue
    1. Redis queue holds a queue of individual tasks from the DAGs. Airflow schedulers fill the queue; Airflow workers take their tasks from it. Redis queue runs as a StatefulSet application in the environment’s cluster, so that messages persist across container restarts.
  5. Airflow web server
    1. Airflow web server runs the Airflow UI of the environment. Airflow web server runs as a Deployment in the environment’s cluster.
  6. Airflow database
    1. Airflow database is a Cloud SQL instance that runs in the tenant project of the environment. It hosts the Airflow metadata database.
  7. Environment’s bucket
    1. Environment’s bucket is a Cloud Storage bucket that stores DAGs, plugins, data dependencies, and Airflow logs. Environment’s bucket resides in the customer project.
  8. Cloud SQL Storage
    1. Cloud SQL Storage stores the Airflow database backups. Cloud Composer backs up the Airflow metadata daily to minimize potential data loss. Cloud SQL Storage runs in the tenant project of the environment.
  9. Cloud SQL Proxy
    1. Connects other components of the environment to the Airflow database. The composer environment can have one or more Cloud SQL Proxy instances that play different roles.
    2. Cloud SQL Proxy authorizes access to the Cloud SQL instance from an application, client, or other Google Cloud service.A Cloud SQL proxy runs as a Deployment in the environment’s cluster, or in an App Engine Flex instance in the tenant project.
  10. Airflow monitoring
    1. Reports environment metrics to Cloud Monitoring and triggers the airflow_monitoring DAG. The airflow_monitoring DAG reports the environment health data on the monitoring dashboard of the environment. Airflow monitoring runs as a Deployment in the environment’s cluster.
  11. Customer Metrics Stackdriver Adapter
    1. Reports metrics of the environment, for autoscaling. This component runs as a Deployment in the environment’s cluster.

As part of Google Cloud Platform (GCP), Cloud Composer integrates with tools such as BigQuery, Dataflow, Dataproc, Datastore, Cloud Storage, Pub/Sub and Cloud ML Engine. Connect with Stacktics to incorporate this powerfully integrated system into your organization.

Have a question, get an answer. We would be happy to chat.