Scenario: A document classification model detects fraud. It performs well on the majority ("legitimate claim") documents but frequently misclassifies the minority ("fraudulent claim") samples. SageMaker Clarify pretraining bias analysis reveals a significant skew in the dataset. Question- What issue is most likely causing the model's poor performance on fraudulent claim detection? Options:
Q: 3
Options
Discussion
Option D, encountered exactly similar question in my exam. Kendra is built for semantic search and processes unstructured data from S3 directly. The other services don't handle semantic queries natively.
Option B If the dataset skew was actually about features and not the targets, then C could be a possible trap answer.
My vote is B, class imbalance is the classic issue here since Clarify flagged dataset skew. That would make the model much worse at spotting the minority (fraud) class. Not totally certain if other factors could play in, but skew points to B.
B imo
I don’t think it’s A. B fits here since class imbalance makes the model weak at detecting the fraud cases. Clarify flagging label skew is a big clue. A and D are common traps but not what's described, agree?
AWS loves to sneak Kendra into these semantic search questions, D imo.
So with this one, wouldn't Kendra (D) be the best fit? SageMaker or OpenSearch can handle embeddings and search, but they're a lot more manual for RAG semantic queries. Kendra's pretty much designed for semantic/contextual search straight from S3 using its connector. Option C is tempting but Textract/OpenSearch won't do deep semantic out of the box. Am I missing a use case where C would be better?
Not D, I'm thinking C because Textract plus Redshift lets you analyze text, then OpenSearch can handle queries.
Its B here, since SageMaker Clarify calling out dataset skew usually points at class imbalance. Fraud is classic for this, minority class trips the model up. Open to being wrong but that's what matches the scenario.
B is the right pick since SageMaker Clarify flagged dataset skew, which really just means class imbalance. That’s classic in fraud scenarios where one label’s way rarer than the other, so model ignores the minority by default. Haven’t seen a question like this catch people with C, but lmk if you think otherwise.
Be respectful. No spam.
Question 3 of 15