Building Batch Data Pipelines on GCP complete course is currently being offered by Google Cloud through Coursera platform.

About this Course:
Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.
EL, ELT, ETL Quiz 1 Answers
Q1. Which of the following is the ideal use case for Extract
and Load (EL)
- Ans:
Scheduled periodic loads of log files (e.g. once a day)
Executing Spark on Cloud Dataproc Quiz 2 Answers
Q1. Which of the following statements are true about Cloud
Dataproc?
- Lets
you run Spark and Hadoop clusters with minimal administration
- Helps
you create job-specific clusters without HDFS
Q2. Match each of the terms with what they do when setting
up clusters in Cloud Dataproc:
Term Definition
__ 1. Zone – A. Costs less but may not be available always
__ 2. Standard Cluster mode – B. Determines the Google data
center where compute nodes will be
__ 3. Preemptible – C. Provides 1 master and N workers
- B
- C
- A
Q3. Cloud Dataproc provides the ability for Spark programs
to separate compute & storage by:
- Reading
and writing data directory from/to Cloud Storage
Cloud Data Fusion and Cloud Composer Quiz 3 Answers
Q1. Cloud Data Fusion is the ideal solution when you need
- to
build visual pipelines
Data Processing with Cloud Dataflow Quiz 4 Answers
Q1. Which of the following statements are true?
- Dataflow
executes Apache Beam pipelines
- Dataflow
transforms support both batch and streaming pipelines
Q2. Match each of the Dataflow terms with what they do in
the life of a dataflow job:
Term Definition
__ 1. Transform A. Output endpoint for your pipeline
__ 2. PCollection B. A data processing operation or step in
your pipeline
__ 3. Sink C. A set of data in your pipeline
- B
- C
- A
Post a Comment