Q: 1
A social media company wants to use a large language model (LLM) for content moderation. The
company wants to evaluate the LLM outputs for bias and potential discrimination against specific
groups or individuals.
Which data source should the company use to evaluate the LLM outputs with the LEAST
administrative effort?
Options
Discussion
D . Benchmark datasets are ready-made for fairness and bias testing, so you don't have to do manual labeling or cleaning. That's way less admin work than sifting through user content or logs. Pretty sure that's what AWS wants here, but open to other thoughts.
Maybe C for this one. Content moderation guidelines could directly show whether the LLM is meeting company bias standards, so it feels like less admin work than gathering and labeling extra data. Not totally sure though since D is strong too, but C fits if we care about internal policy.
Option D, Benchmark datasets are already structured and labeled for bias testing, so you don't need to do extra admin work setting anything up. Way easier than working with raw user content or logs. Pretty sure that's the quickest path if allowed, but open if I'm missing something.
Gotta be D here. Benchmark datasets are already curated for fairness/bias, so there's no need to clean or annotate like you'd have to with logs or user content. That's why this is the lowest admin effort, at least in most scenarios. Pretty sure that's what they're looking for but happy to hear counterpoints.
So tired of these "admin effort" questions, D imo.
D
D for sure, seen similar question in exam dumps. Benchmark datasets are ready to use, so barely any admin work needed.
C or D? Pretty sure I've seen similar practice questions say moderation guidelines (C) can help evaluate for bias, especially if you're comparing LLM outputs to company standards. Still, official guides and the AWS sample exams lean more towards D for standardized assessment.
Yeah, D makes sense. Benchmark datasets are pre-labeled and structured so you skip all the manual effort with user content or logs. Least admin overhead out of these options, I think. If anyone's seen a good counter for B let me know, but D feels right.
D , unless there's some internal requirement to only use company data, benchmark datasets are obviously lowest admin work.
Be respectful. No spam.