Q: 7
Your company is implementing a data warehouse using BigQuery, and you have been tasked with
designing the data model You move your on-premises sales data warehouse with a star data schema
to BigQuery but notice performance issues when querying the data of the past 30 days Based on
Google's recommended practices, what should you do to speed up the query without increasing
storage costs?
Options
Discussion
C . Had something like this in a mock and Google's guidance is to materialize dimensions using views when you need joins in a star schema, especially if you want to speed things up but not use more storage. Partitioning would help for date filters, but the question asks about storage impact specifically. Might be tricky but I’d stick with C here. Agree?
C . Materializing dimensional data in views helps BigQuery run those star-schema joins more efficiently, and it doesn't bump up your storage bill. If the pain is with join performance and not just recent date scans, this fits. Anyone else think that's right?
Option C
C/D? If most queries filter on the last 30 days, then D (partitioning by transaction date) usually gives a noticeable boost, especially in BigQuery table scans. C helps more if joins are the pain point and you want to avoid extra storage. Kinda depends what slows down the query here. I lean C given the storage cost bit, but it feels close.
Ugh, these GCP questions love to trip me up. Probably D because partitioning by transaction date should make recent queries run way faster, especially when filtering on the past 30 days. I think that's standard for BigQuery performance tweaks, but maybe I'm missing something?
B tbh. Sharding by customer ID could split the data and maybe help with performance if queries are always by customer, but I don't remember Google recommending this for recent time-based filtering. I'm pretty sure splitting that way lets BigQuery scan less if customers are evenly distributed. Could be wrong, open to corrections.
D, Partitioning by transaction date feels more natural here since that's usually how you target recent data performance in BigQuery.
D imo. Partitioning by transaction date should make queries for recent data a lot faster in BigQuery, since it reduces scanned data without more storage cost. Maybe I'm missing something but that's how I'd approach it.
C is better here, partitioning by date (D) looks useful but could increase costs if not filtered right.
Probably C, since materializing dimension data with views can speed up joins without more storage cost.
Be respectful. No spam.