Q: 7
To run a TensorFlow training job on your own computer using Cloud Machine Learning Engine, what
would your command start with?
Options
Discussion
C . Had something like this in a mock and Google's guidance is to materialize dimensions using views when you need joins in a star schema, especially if you want to speed things up but not use more storage. Partitioning would help for date filters, but the question asks about storage impact specifically. Might be tricky but I’d stick with C here. Agree?
C . Materializing dimensional data in views helps BigQuery run those star-schema joins more efficiently, and it doesn't bump up your storage bill. If the pain is with join performance and not just recent date scans, this fits. Anyone else think that's right?
Option C
C/D? If most queries filter on the last 30 days, then D (partitioning by transaction date) usually gives a noticeable boost, especially in BigQuery table scans. C helps more if joins are the pain point and you want to avoid extra storage. Kinda depends what slows down the query here. I lean C given the storage cost bit, but it feels close.
Ugh, these GCP questions love to trip me up. Probably D because partitioning by transaction date should make recent queries run way faster, especially when filtering on the past 30 days. I think that's standard for BigQuery performance tweaks, but maybe I'm missing something?
B tbh. Sharding by customer ID could split the data and maybe help with performance if queries are always by customer, but I don't remember Google recommending this for recent time-based filtering. I'm pretty sure splitting that way lets BigQuery scan less if customers are evenly distributed. Could be wrong, open to corrections.
D, Partitioning by transaction date feels more natural here since that's usually how you target recent data performance in BigQuery.
D imo. Partitioning by transaction date should make queries for recent data a lot faster in BigQuery, since it reduces scanned data without more storage cost. Maybe I'm missing something but that's how I'd approach it.
C is better here, partitioning by date (D) looks useful but could increase costs if not filtered right.
Probably C, since materializing dimension data with views can speed up joins without more storage cost.
Be respectful. No spam.