Q: 1
The CUSTOM tier for Cloud Machine Learning Engine allows you to specify the number of which
types of cluster nodes?
Options
Discussion
Option C You get to set both workers and parameter servers with CUSTOM, that's not true for every tier. Pretty sure that's what the docs say too.
Why do Google questions always feel like a riddle just to test tiers? C, workers and parameter servers.
C imo. Pretty clear question, nice to see the options laid out like that.
Be respectful. No spam.
Q: 2
You are designing the architecture to process your data from Cloud Storage to BigQuery by using
Dataflow. The network team provided you with the Shared VPC network and subnetwork to be used
by your pipelines. You need to enable the deployment of the pipeline on the Shared VPC network.
What should you do?
Options
Discussion
A is wrong, B. The pipeline needs networkUser on the service account, not the service agent.
D imo. Assigning the dataflow.admin role to the service account seems like it would give enough permissions for the pipeline to run on the Shared VPC, including managing jobs and maybe network access too. Not 100% sure if that's all that's needed but dataflow.admin feels pretty broad for Dataflow tasks. Correct me if I'm missing something.
Be respectful. No spam.
Q: 3
You operate a logistics company, and you want to improve event delivery reliability for vehicle-based
sensors. You operate small data centers around the world to capture these events, but leased lines
that provide connectivity from your event collection infrastructure to your event processing
infrastructure are unreliable, with unpredictable latency. You want to address this issue in the most
cost-effective way. What should you do?
Options
Discussion
Option C
Be respectful. No spam.
Q: 4
Which of the following is not true about Dataflow pipelines?
Options
Discussion
D not much doubt. Official guide explains this, worth reviewing the Dataflow pipeline concepts again if unsure.
A is fine, but D isn't. Pipelines can't share data between instances.
D imo
Be respectful. No spam.
Q: 5
Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-use
Hadoop jobs they have already created and minimize the management of the cluster as much as
possible. They also want to be able to persist data beyond the life of the cluster. What should you do?
Options
Discussion
B tbh
D imo
B, not D
Be respectful. No spam.
Q: 6
You are running a pipeline in Cloud Dataflow that receives messages from a Cloud Pub/Sub topic and
writes the results to a BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4
and has a maximum of 3 workers, instance type n1-standard-1. You notice that during peak periods,
your pipeline is struggling to process records in a timely fashion, when all 3 workers are at maximum
CPU utilization. Which two actions can you take to increase performance of your pipeline? (Choose
two.)
Options
Discussion
Option A and B. More workers or bigger machines will give better throughput for Dataflow jobs, pretty sure these are the intended performance fixes here.
Be respectful. No spam.
Q: 7
Your company is implementing a data warehouse using BigQuery, and you have been tasked with
designing the data model You move your on-premises sales data warehouse with a star data schema
to BigQuery but notice performance issues when querying the data of the past 30 days Based on
Google's recommended practices, what should you do to speed up the query without increasing
storage costs?
Options
Discussion
Ugh, these GCP questions love to trip me up. Probably D because partitioning by transaction date should make recent queries run way faster, especially when filtering on the past 30 days. I think that's standard for BigQuery performance tweaks, but maybe I'm missing something?
B tbh. Sharding by customer ID could split the data and maybe help with performance if queries are always by customer, but I don't remember Google recommending this for recent time-based filtering. I'm pretty sure splitting that way lets BigQuery scan less if customers are evenly distributed. Could be wrong, open to corrections.
Be respectful. No spam.
Q: 8
You are designing a data processing pipeline. The pipeline must be able to scale automatically as load
increases. Messages must be processed at least once, and must be ordered within windows of 1
hour. How should you design the solution?
Options
Discussion
C/D? Practice exams hit on Dataflow windowing a lot. Check the official guide and do hands-on labs to be sure.
B tbh
Its D, Pub/Sub for ingestion and Dataflow for windowed ordering is the combo that auto-scales. Kafka is a common distractor here.
Is "must be ordered within windows" strictly referring to event time or processing time? If it's event time windows, then D fits, but if strict end-to-end ordering is needed across all messages, other options might be better.
Be respectful. No spam.
Q: 9
You are designing a fault-tolerant architecture to store data in a regional BigOuery dataset. You need
to ensure that your application is able to recover from a corruption event in your tables that occurred
within the past seven days. You want to adopt managed services with the lowest RPO and most cost-
effective solution. What should you do?
Options
Discussion
Option C here, as time travel is built-in for seven days and does not add extra costs. Clean and straightforward question.
C tbh, seen similar advice in official docs and exam practice tests.
Be respectful. No spam.
Q: 10
You work for a car manufacturer and have set up a data pipeline using Google Cloud Pub/Sub to
capture anomalous sensor events. You are using a push subscription in Cloud Pub/Sub that calls a
custom HTTPS endpoint that you have created to take action of these anomalous events as they
occur. Your custom HTTPS endpoint keeps getting an inordinate amount of duplicate messages. What
is the most likely cause of these duplicate messages?
Options
Discussion
Option B actually makes sense here. If the SSL cert is out-of-date, Pub/Sub’s push will get handshake failures and treat it like a non-ack, which leads to retries and so you get duplicates. I know D's also common but in this scenario, cert problems will cause exactly this issue. Pretty sure B is right-correct me if I’m missing something.
Nah, I don’t think it’s D here. B is the catch-Pub/Sub push fails when certs are invalid, so messages aren’t acknowledged and you get retries (hence duplicates). D is a standard culprit but this question points at HTTPS/SSL issues.
Its D, not B. Pub/Sub will keep retrying if your endpoint isn't sending an ack fast enough, classic trap here.
Guessing D. Pub/Sub resends unacked messages so if your endpoint isn't acknowledging them in time, you'll get duplicates.
Be respectful. No spam.
Question 1 of 20 · Page 1 / 2