A . Feature selection
Feature selection is the process of selecting the most relevant features from the data. While
important, it is not directly about handling excess data.
B . Data sampling
Data sampling involves selecting a representative subset of the data for training. When there is more
data than needed, sampling can be used to create a manageable dataset that maintains the
statistical properties of the full dataset.
C . Data labeling
Data labeling involves annotating data for supervised learning. It is necessary for training models but
does not address the issue of having excess data.
D . Data augmentation
Data augmentation is used to increase the size of the training dataset by creating modified versions
of existing data. It is useful when there is insufficient data, not when there is excess data.
Therefore, the correct answer is B because data sampling is the most relevant activity when dealing
with an excess amount of data for training.