Q: 17
[Modeling]
An online reseller has a large, multi-column dataset with one column missing 30% of its data A
Machine Learning Specialist believes that certain columns in the dataset could be used to reconstruct
the missing data.
Which reconstruction approach should the Specialist use to preserve the integrity of the dataset?
Options
Discussion
I don’t think it’s B. Multiple imputation (C) is more robust here since it uses other columns to estimate missing values, which helps maintain statistical integrity. Last observation carried forward works best for time series but not general datasets like this.
Seen similar on the official practice test, pretty sure it's C.
C or D? Mean substitution is quick and keeps the dataset size but with 30% missing, results can get skewed.
Its B. Had something like this in a mock and used last observation carried forward for filling missing values since it reuses real data, keeps the dataset size stable. Pretty sure that's the best for integrity. Anyone got a different take?
Be respectful. No spam.