Q: 4
Which of the following practices are best suited to optimize the performance of a deployed generative AI
model in IBM watsonx under real-world traffic conditions? (Select two)
Options
Discussion
Its C and E. Quantization (C) cuts down model size which improves inference speed, and dynamic resource allocation (E) handles changing loads efficiently. The rest either ignore hardware variance or could waste resources. I think this lines up with best practices, but open to other ideas.
I've seen similar cases in practice tests and official IBM docs. D and E.
I don't think loading the full model into memory (D) is practical for all situations, so C and E fit better here.
Maybe C and E. Quantization is a great performance trick and dynamic resource allocation just makes sense for production loads. Nice clear options here, not tricky. Pretty sure that's right but open to other views.
Be respectful. No spam.