Q: 11
A Generative AI Engineer just deployed an LLM application at a digital marketing company that
assists with answering customer service inquiries.
Which metric should they monitor for their customer service LLM application in production?
Options
Discussion
Maybe A but not 100 percent. If the LLM is actually deployed and serving real users, throughput metrics like customer inquiries per time make sense. But if there are strict SLAs on quality or latency, those might get monitored more in some orgs. Always feels like a trick when other operational or effectiveness metrics aren't options. Anyone else see a similar edge case?
Not D, that's about model benchmarks not real-world usage. A makes more sense for monitoring production performance.
A over the rest here, since in production it's all about throughput and making sure the system can process user requests efficiently. Perplexity and leaderboard scores matter more during model evaluation, not once it’s live. I think A is right but open to other ideas if someone sees a catch.
Nah, I don't think it's D. A is what you actually need to track for live ops-D just distracts with benchmark hype.
So why not B if someone cares about sustainability metrics? Isn't throughput (A) always the default for production ops, or are there exceptions with LLMs in customer service?
Pretty sure it's A here. In production, tracking how many customer inquiries get handled per unit time tells you if the app's keeping up and if users are being served. B is interesting but more about operational costs. C and D are focused on model training or benchmarking, not live metrics. If I'm missing something, happy to hear other views.
Be respectful. No spam.