Q: 5
[Data Engineering]
A machine learning specialist is preparing data for training on Amazon SageMaker. The specialist is
using one of the SageMaker built-in algorithms for the training. The dataset is stored in .CSV format
and is transformed into a numpy.array, which appears to be negatively affecting the speed of the
training.
What should the specialist do to optimize the data for training on SageMaker?
Options
Discussion
Option C, Had something like this in a mock, and SageMaker built-in algorithms are definitely optimized for RecordIO protobuf format. Using numpy arrays can slow down training since the built-ins expect RecordIO for efficiency. Not 100% but pretty sure C is the way to go here, agree?
Its C here. The built-in SageMaker algorithms are optimized for RecordIO protobuf, not numpy arrays or even Parquet, which is mostly about storage efficiency. D feels like a trap-hyperparameter optimization only tunes model params, not the raw data format. Pretty sure C, but if I missed something let me know.
Could honestly see a case for B but C matches best for boosting SageMaker training speed. Not totally sure, anyone pick B?
D . RecordIO protobuf is the efficient choice here, but D could trip folks up since hyperparameter optimization doesn't touch data format at all. So C for speed.
C
RecordIO protobuf (C) makes sense since it's the format SageMaker built-ins ingest most efficiently. Numpy arrays aren't optimized for distributed training there. Pretty sure that's what AWS recommends for speed, but open if anyone saw otherwise in practice.
Probably C here. RecordIO protobuf is what SageMaker wants for built-in algos, so that's gonna speed things up vs numpy arrays.
Its C. RecordIO protobuf is specifically used by SageMaker built-in algorithms for better speed and parallelism. DataFrames or batch transform won’t improve training performance for built-ins. I’m pretty sure about this but open to feedback if anyone heard different.
Transforming to RecordIO format is what SageMaker built-ins work best with, so C.
Be respectful. No spam.