Q: 12
[Modeling]
A finance company needs to forecast the price of a commodity. The company has compiled a dataset
of historical daily prices. A data scientist must train various forecasting models on 80% of the dataset
and must validate the efficacy of those models on the remaining 20% of the dataset.
What should the data scientist split the dataset into a training dataset and a validation dataset to
compare model performance?
Options
Discussion
Had something like this in a mock, it's A for time series so you don't leak future info into training.
Actually, for time series data you shouldn't use random sampling like D says. A is the way to go.
Its D. Random sampling without replacement is a common way to split data for training and validation, so I think it should work here too. Not 100% but makes sense for most ML problems.
Be respectful. No spam.