1. Ruder
S. (2017). An Overview of Multi-Task Learning in Deep Neural Networks. arXiv:1706.05098. In Section 2
"How multi-task learning works
" the paper states
"MTL improves generalization by leveraging the domain-specific information contained in the training signals of related tasks... By learning tasks in parallel
the model can learn a representation that is shared among them... For this reason
MTL is often considered a form of regularization." This supports the idea that single-task learning lacks this generalization benefit.
2. Raffel
C.
Shazeer
N.
Roberts
A.
Lee
K.
Narang
S.
Matena
M.
... & Liu
P. J. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research
21(140)
1-67. Section 3.3 discusses multi-task pre-training
noting that combining diverse tasks helps the model achieve better generalization
a benefit not fully realized when fine-tuning on a single
narrow downstream task.
3. Stanford University. (2023). CS224N: Natural Language Processing with Deep Learning
Winter 2023 Lecture 12: Pretraining and Transfer Learning. In the discussion on fine-tuning
the lecture notes explain that fine-tuning on a small
single-task dataset risks overfitting
whereas multi-task learning helps the model learn more general features
improving performance on out-of-domain or unseen tasks.