Q: 8
Scenario: Preparing historical transactional data for SageMaker built-in classification algorithms (like
XGBoost). The data includes client identifier (unique string) and operation status (label/target). The data
must be compatible with SageMaker algorithms and maintain a valid label structure.
Question- Which preprocessing step should the AI developer perform before training the model in
SageMaker AI?.
Options:
Options
Discussion
B is off, D is what you want for XGBoost here.
D imo, you need to drop the client identifier and encode the status label for XGBoost in SageMaker.
I saw similar cases in AWS docs and official prep guides, D is right here.
Keeping client ID in seems ok, so I'd pick B here. Dropping both fields leaves just the rest for training.
D imo, because keeping client IDs is a classic trap-they don’t help the model and can mess with predictions due to high cardinality. You have to turn operation status into numbers for SageMaker XGBoost to work right. Seen similar advice in AWS docs, so I’m pretty sure D fits best. Option B drops the label which would break training.
Its D for sure. You want to drop the client identifier since it adds no useful info and can mess up training, then encode the label as numbers because SageMaker's XGBoost needs numeric targets. You can double-check this in the official prep guide if you want.
Be respectful. No spam.
Question 8 of 15