Option A makes sense if the main constraint is not raising costs. Pig sits on top of Hadoop, so you can optimize scripts for responsiveness without spinning up more resources, unlike C. Spark (B) works great but often needs infrastructure tweaks or more powerful nodes, which could mean extra spending. Hard to tell unless you know the exact bottleneck, but for exam wording 'no increased cost', A fits best I think. Thoughts?
Had something like this in a mock. B and C are right since policy tags need to be set up with the right access controls, and you don't want analytics folks having the Fine-Grained Reader role for sensitive columns. Always double check the Data Catalog permissions too. Pretty sure that's it but open to other takes if I missed anything.
Is the question asking specifically about running jobs on Dataflow, or just any pipeline job in general? If it’s about Dataflow pipelines, A is best, but a different service might need another role.