Q: 16
After completing a prompt-tuning experiment, you notice that the model's accuracy in generating relevant
responses is high, but the fluency and grammatical correctness of the outputs seem to be suboptimal.
What statistical metric would most directly indicate this issue, and what action should you take to
improve the output?
Options
Discussion
Its D, because perplexity relates to fluency and grammar, not just content accuracy. Pretty sure this matches typical NLP evals.
Be respectful. No spam.