View Mode
Q: 11
You have several Spark jobs that run on a Cloud Dataproc cluster on a schedule. Some of the jobs run in sequence, and some of the jobs run concurrently. You need to automate this process. What should you do?
Options
Q: 12

Government regulations in your industry mandate that you have to maintain an auditable record of access to certain types of data. Assuming that all expiring logs will be archived correctly, where should you store data that is subject to that mandate?

Options
Q: 13
Your team is building a data lake platform on Google Cloud. As a part of the data foundation design, you are planning to store all the raw data in Cloud Storage You are expecting to ingest approximately 25 GB of data a day and your billing department is worried about the increasing cost of storing old dat a. The current business requirements are: • The old data can be deleted anytime • You plan to use the visualization layer for current and historical reporting • The old data should be available instantly when accessed • There should not be any charges for data retrieval. What should you do to optimize for cost?
Options
Q: 14
Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?
Options
Q: 15
Cloud Dataproc is a managed Apache Hadoop and Apache _____ service.
Options
Q: 16
An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?
Options
Q: 17
You have an upstream process that writes data to Cloud Storage. This data is then read by an Apache Spark job that runs on Dataproc. These jobs are run in the us-central1 region, but the data could be stored anywhere in the United States. You need to have a recovery process in place in case of a catastrophic single region failure. You need an approach with a maximum of 15 minutes of data loss (RPO=15 mins). You want to ensure that there is minimal latency when reading the dat a. What should you do?
Options
Q: 18
Which of the following is not true about Dataflow pipelines?
Options
Q: 19
You used Cloud Dataprep to create a recipe on a sample of data in a BigQuery table. You want to reuse this recipe on a daily upload of data with the same schema, after the load job with variable execution time completes. What should you do?
Options
Q: 20
You are planning to use Cloud Storage as pad of your data lake solution. The Cloud Storage bucket will contain objects ingested from external systems. Each object will be ingested once, and the access patterns of individual objects will be random. You want to minimize the cost of storing and retrieving these objects. You want to ensure that any cost optimization efforts are transparent to the users and applications. What should you do?
Options
Question 11 of 20 · Page 2 / 2

Premium Access Includes

  • ✓ Quiz Simulator
  • ✓ Exam Mode
  • ✓ Progress Tracking
  • ✓ Question Saving
  • ✓ Flash Cards
  • ✓ Drag & Drops
  • ✓ 3 Months Access
  • ✓ PDF Downloads
Get Premium Access
Scroll to Top

FLASH OFFER

Days
Hours
Minutes
Seconds

avail 10% DISCOUNT on YOUR PURCHASE