Pretty sure it's A here. In production, tracking how many customer inquiries get handled per unit time tells you if the app's keeping up and if users are being served. B is interesting but more about operational costs. C and D are focused on model training or benchmarking, not live metrics. If I'm missing something, happy to hear other views.
A Generative Al Engineer is building a system which will answer questions on latest stock news articles. Which will NOT help with ensuring the outputs are relevant to financial news?
MLflow PyFunc is the go-to method here, so D. Saw this approach recommended in the official Databricks guide. If anyone saw another method used in recent exam practice, let me know.
I don’t think C is the way. Preprocessing the prompt up front usually affects how the LLM interprets your inputs, which can make a bigger difference than cleaning up the output afterward. Postprocessing helps but doesn’t fix issues caused by a poorly structured prompt. Pretty sure D is more standard but open to other takes.
Yeah, D looks right to me. Using an MLflow PyFunc model gives you a clean way to bundle custom preprocessing steps with LLM calls, which is super handy for production pipelines in Databricks. Directly modifying the LLM architecture (A) is risky and not typical here. I think D but happy to be challenged if someone found a better approach.
D is correct for Databricks flow. If the question asked for the most secure method, would that change the pick?
Really wish Databricks would phrase these less vaguely. C makes sense since using chat logs to find and summarize similar answers hits both speed and personalized response. Pretty sure that's what they want here, but open to pushback.