Q: 12
You are working on optimizing a large language model (LLM) using quantization techniques. Your goal is
to reduce memory usage while maintaining as much of the model’s original accuracy as possible. What
is a common challenge faced when applying quantization to LLMs, and how can it be mitigated?
Options
Discussion
Option C makes sense since quantization can hit accuracy hard, especially for embeddings where precision matters. Quantization-aware training helps by letting the model adapt during training. I think that's the best approach here but someone might argue for B in very rare cases.
B or D seem reasonable because some layers just can't handle quantization well, especially embeddings. Skipping them (B) or just picking a smaller model (D) could make sense in practice. Not 100% sure, anyone disagree?
Be respectful. No spam.