Q: 16
An external customer provides you with a daily dump of data from their database. The data flows
into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data
in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How
should you build this pipeline?
Options
Discussion
Official practice test questions help for these. A
B tbh. Cloud Functions plus some shell scripting sounds simpler if ETL is periodic, and you can still automate the process. Maybe not as resource-efficient as D but feels less heavy to set up for small teams. Agree?
D makes sense here, since Apache Beam supports custom connectors and Dataflow lets you process proprietary formats in a streaming pipeline. Converting to Avro is efficient for BigQuery. Saw similar logic pop up in some recent exam reports too, but open to other ideas.
Be respectful. No spam.