a smol course documentation

Introduction to Instruction Tuning with SmolLM3

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Introduction to Instruction Tuning with SmolLM3

Welcome to the smollest course of fine-tuning! This module will guide you through instruction tuning using SmolLM3, Hugging Face’s latest 3B parameter model that achieves state-of-the-art performance for its size, while remaining accessible for learning and experimentation.

By the end of this course you will be fine tuning an LLM with SFT. This course is smol but fast! If you’re like for a smoother gradient, check out the The LLM Course.

After completing this unit (and the assignment), don’t forget to test your knowledge with the quiz!

What is Instruction Tuning?

Instruction tuning is the process of adapting pre-trained language models to follow human instructions and engage in conversations. While base models like SmolLM3-3B-Base are trained to predict the next token, instruction-tuned models like SmolLM3-3B are specifically trained to:

  • Follow user instructions accurately
  • Engage in natural conversations
  • Provide helpful, harmless, and honest responses
  • Maintain context across multi-turn interactions
  • Use tools or MCP servers to perform tasks

This transformation from a text completion model to an instruction-following assistant is achieved through supervised fine-tuning on carefully curated datasets.

We dive into the instruction tuning here in the LLM Course.

Why SmolLM3 for Learning?

SmolLM3 is perfect for learning instruction tuning because it:

  • Fits in a single GPU at a reasonable cost
  • Achieves competitive performance
  • Supports multilingual conversations
  • Supports extended context length up to 8k tokens (with some variants supporting longer contexts up to 128k tokens)
  • Features dual-mode reasoning with explicit thinking capabilities
  • Comes with complete training recipes so you understand exactly how it was built

Module Overview

In this comprehensive module, we will explore four key areas:

Chat Templates

Chat templates are the foundation of instruction tuning - they structure interactions between users and AI models, ensuring consistent and contextually appropriate responses. You’ll learn:

  • How SmolLM3’s chat template works
  • Converting conversations to the proper format
  • Working with system prompts and multi-turn conversations
  • Using the Transformers library’s built-in template support

For detailed information, see Chat Templates.

Supervised Fine-Tuning (SFT)

Supervised Fine-Tuning is the core technique for adapting pre-trained models to follow instructions. You’ll master:

  • The theory behind SFT and when to use it
  • Working with the SmolTalk2 dataset
  • Using TRL’s SFTTrainer for efficient training
  • Best practices for data preparation and training configuration

For a comprehensive guide, see Supervised Fine-Tuning.

Hands-on Exercises

Put your knowledge into practice with progressively challenging exercises:

  • Process datasets for instruction tuning
  • Fine-tune SmolLM3 on different tasks
  • Use both Python APIs and CLI tools
  • Compare base model vs fine-tuned model performance

Complete exercises and examples are in Exercises.

Hugging Face Jobs

Hugging Face Jobs is a fully managed cloud infrastructure for training models without the hassle of setting up GPUs, managing dependencies, or configuring environments locally. This is particularly valuable for SFT training, which can be resource-intensive and time-consuming.

For a comprehensive guide, see Hugging Face Jobs.

What You’ll Build

By the end of this module, you’ll have:

  • Fine-tuned your own SmolLM3 model on a custom dataset
  • Understanding of chat template formatting and conversation structure
  • Experience with both programmatic and CLI-based training workflows
  • A model deployed to Hugging Face Hub that others can use
  • Foundation knowledge for more advanced fine-tuning techniques

Let’s dive into the fascinating world of instruction tuning!

References

< > Update on GitHub