Question 12

Question

[Modeling]
A finance company needs to forecast the price of a commodity. The company has compiled a dataset
of historical daily prices. A data scientist must train various forecasting models on 80% of the dataset
and must validate the efficacy of those models on the remaining 20% of the dataset.
What should the data scientist split the dataset into a training dataset and a validation dataset to
compare model performance?

Accepted Answer

Pick a date so that 80% to the data points precede the date Assign that group of data points as the
training dataset. Assign all the remaining data points to the validation dataset.

Parker · Answer

Had something like this in a mock, it's A for time series so you don't leak future info into training.

Ivy · Answer

Actually, for time series data you shouldn't use random sampling like D says. A is the way to go.

CuriousArchitect7211 · Answer

Its D. Random sampling without replacement is a common way to split data for training and validation, so I think it should work here too. Not 100% but makes sense for most ML problems.

Premium Access Includes

FLASH OFFER

avail 10% DISCOUNT on YOUR PURCHASE