1. HuggingFace Official Documentation (Optimum Library): The documentation for the Optimum library explicitly details the integration with NVIDIA TensorRT for inference acceleration. It states, "Optimum provides a simple interface to optimize your models and run them with hardware accelerators like ONNX Runtime or TensorRT."
Source: HuggingFace. (n.d.). Hardware-accelerated inference with Optimum. HuggingFace Documentation. Retrieved from https://huggingface.co/docs/optimum/index. Section: "Inference with TensorRT".
2. HuggingFace Official Documentation (Trainer API): The core Trainer class, used for fine-tuning, is designed to automatically handle device placement (CPU or GPU) through its PyTorch or TensorFlow backend, demonstrating the seamless integration. The documentation notes that the Trainer will "use the GPU if it is available."
Source: HuggingFace. (n.d.). Trainer. HuggingFace Transformers Documentation. Retrieved from https://huggingface.co/docs/transformers/mainclasses/trainer. Section: "Important training arguments".
3. NVIDIA Official Developer Blog: NVIDIA provides official guides on using its technologies with HuggingFace. A technical blog post details the process and benefits of using TensorRT with HuggingFace models, confirming the deep integration and performance gains.
Source: NVIDIA Developer Blog. (2023, May 24). Accelerating Llama 2 with NVIDIA TensorRT-LLM. "NVIDIA TensorRT-LLM supercharges inference performance for the latest large language models (LLMs) on NVIDIA GPUs... It also includes a Python API that is similar to the Hugging Face Transformers API".
4. Stanford University Courseware (CS224N): Lecture materials for courses on Natural Language Processing with Deep Learning frequently cite HuggingFace Transformers as the standard library for building and training models, with practical assignments requiring the use of GPUs for training efficiency.
Source: Stanford University. (2023). CS224N: Natural Language Processing with Deep Learning. Lecture 5: "Fine-Tuning and Pre-trained Language Models". The course materials and assignments consistently use PyTorch and HuggingFace on GPU-enabled platforms like Google Colab.