| # Debugging Axolotl | |
| This document provides some tips and tricks for debugging Axolotl. It also provides an example configuration for debugging with VSCode. A good debugging setup is essential to understanding how Axolotl code works behind the scenes. | |
| ## Table of Contents | |
| - [General Tips](#general-tips) | |
| - [Debugging with VSCode](#debugging-with-vscode) | |
| - [Background](#background) | |
| - [Configuration](#configuration) | |
| - [Customizing your debugger](#customizing-your-debugger) | |
| - [Video Tutorial](#video-tutorial) | |
| - [Debugging With Docker](#debugging-with-docker) | |
| - [Setup](#setup) | |
| - [Attach To Container](#attach-to-container) | |
| - [Video - Attaching To Docker On Remote Host](#video---attaching-to-docker-on-remote-host) | |
| ## General Tips | |
| While debugging it's helpful to simplify your test scenario as much as possible. Here are some tips for doing so: | |
| > [!Important] | |
| > All of these tips are incorporated into the [example configuration](#configuration) for debugging with VSCode below. | |
| 1. **Make sure you are using the latest version of axolotl**: This project changes often and bugs get fixed fast. Check your git branch and make sure you have pulled the latest changes from `main`. | |
| 1. **Eliminate concurrency**: Restrict the number of processes to 1 for both training and data preprocessing: | |
| - Set `CUDA_VISIBLE_DEVICES` to a single GPU, ex: `export CUDA_VISIBLE_DEVICES=0`. | |
| - Set `dataset_processes: 1` in your axolotl config or run the training command with `--dataset_processes=1`. | |
| 2. **Use a small dataset**: Construct or use a small dataset from HF Hub. When using a small dataset, you will often have to make sure `sample_packing: False` and `eval_sample_packing: False` to avoid errors. If you are in a pinch and don't have time to construct a small dataset but want to use from the HF Hub, you can shard the data (this will still tokenize the entire dataset, but will only use a fraction of the data for training. For example, to shard the dataset into 20 pieces, add the following to your axolotl config): | |
| ```yaml | |
| dataset: | |
| ... | |
| shards: 20 | |
| ``` | |
| 3. **Use a small model**: A good example of a small model is [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0). | |
| 4. **Minimize iteration time**: Make sure the training loop finishes as fast as possible, with these settings. | |
| - `micro_batch_size: 1` | |
| - `max_steps: 1` | |
| - `val_set_size: 0` | |
| 5. **Clear Caches:** Axolotl caches certain steps and so does the underlying HuggingFace trainer. You may want to clear some of these caches when debugging. | |
| - Data preprocessing: When debugging data preprocessing, which includes prompt template formation, you may want to delete the directory set in `dataset_prepared_path:` in your axolotl config. If you didn't set this value, the default is `last_run_prepared`. | |
| - HF Hub: If you are debugging data preprocessing, you should clear the relevant HF cache [HuggingFace cache](https://huggingface.co/docs/datasets/cache), by deleting the appropriate `~/.cache/huggingface/datasets/...` folder(s). | |
| - **The recommended approach is to redirect all outputs and caches to a temporary folder and delete selected subfolders before each run. This is demonstrated in the example configuration below.** | |
| ## Debugging with VSCode | |
| ### Background | |
| The below example shows how to configure VSCode to debug data preprocessing of the `sharegpt` format. This is the format used when you have the following in your axolotl config: | |
| ```yaml | |
| datasets: | |
| - path: <path to your sharegpt formatted dataset> # example on HF Hub: philschmid/guanaco-sharegpt-style | |
| type: sharegpt | |
| ``` | |
| >[!Important] | |
| > If you are already familiar with advanced VSCode debugging, you can skip the below explanation and look at the files [.vscode/launch.json](../.vscode/launch.json) and [.vscode/tasks.json](../.vscode/tasks.json) for an example configuration. | |
| >[!Tip] | |
| > If you prefer to watch a video, rather than read, you can skip to the [video tutorial](#video-tutorial) below (but doing both is recommended). | |
| ### Setup | |
| Make sure you have an [editable install](https://setuptools.pypa.io/en/latest/userguide/development_mode.html) of Axolotl, which ensures that changes you make to the code are reflected at runtime. Run the following commands from the root of this project: | |
| ```bash | |
| pip3 install packaging | |
| pip3 install -e '.[flash-attn,deepspeed]' | |
| ``` | |
| #### Remote Hosts | |
| If you developing on a remote host, you can easily use VSCode to debug remotely. To do so, you will need to follow this [remote - SSH guide](https://code.visualstudio.com/docs/remote/ssh). You can also see the video below on [Docker and Remote SSH debugging](#video---attaching-to-docker-on-remote-host). | |
| ```bash | |
| ### Configuration | |
| The easiest way to get started is to modify the [.vscode/launch.json](../.vscode/launch.json) file in this project. This is just an example configuration, so you may need to modify or copy it to suit your needs. | |
| For example, to mimic the command `cd devtools && CUDA_VISIBLE_DEVICES=0 accelerate launch -m axolotl.cli.train dev_sharegpt.yml`, you would use the below configuration[^1]. Note that we add additional flags that override the axolotl config and incorporate the tips above (see the comments). We also set the working directory to `devtools` and set the `env` variable `HF_HOME` to a temporary folder that is later partially deleted. This is because we want to delete the HF dataset cache before each run in order to ensure that the data preprocessing code is run from scratch. | |
| ```jsonc | |
| // .vscode/launch.json | |
| { | |
| "version": "0.2.0", | |
| "configurations": [ | |
| { | |
| "name": "Debug axolotl prompt - sharegpt", | |
| "type": "python", | |
| "module": "accelerate.commands.launch", | |
| "request": "launch", | |
| "args": [ | |
| "-m", "axolotl.cli.train", "dev_sharegpt.yml", | |
| // The flags below simplify debugging by overriding the axolotl config | |
| // with the debugging tips above. Modify as needed. | |
| "--dataset_processes=1", // limits data preprocessing to one process | |
| "--max_steps=1", // limits training to just one step | |
| "--batch_size=1", // minimizes batch size | |
| "--micro_batch_size=1", // minimizes batch size | |
| "--val_set_size=0", // disables validation | |
| "--sample_packing=False", // disables sample packing which is necessary for small datasets | |
| "--eval_sample_packing=False",// disables sample packing on eval set | |
| "--dataset_prepared_path=temp_debug/axolotl_outputs/data", // send data outputs to a temp folder | |
| "--output_dir=temp_debug/axolotl_outputs/model" // send model outputs to a temp folder | |
| ], | |
| "console": "integratedTerminal", // show output in the integrated terminal | |
| "cwd": "${workspaceFolder}/devtools", // set working directory to devtools from the root of the project | |
| "justMyCode": true, // step through only axolotl code | |
| "env": {"CUDA_VISIBLE_DEVICES": "0", // Since we aren't doing distributed training, we need to limit to one GPU | |
| "HF_HOME": "${workspaceFolder}/devtools/temp_debug/.hf-cache"}, // send HF cache to a temp folder | |
| "preLaunchTask": "cleanup-for-dataprep", // delete temp folders (see below) | |
| } | |
| ] | |
| } | |
| ``` | |
| **Additional notes about this configuration:** | |
| - The argument `justMyCode` is set to `true` such that you step through only the axolotl code. If you want to step into dependencies, set this to `false`. | |
| - The `preLaunchTask`: `cleanup-for-dataprep` is defined in [.vscode/tasks.json](../.vscode/tasks.json) and is used to delete the following folders before debugging, which is essential to ensure that the data pre-processing code is run from scratch: | |
| - `./devtools/temp_debug/axolotl_outputs` | |
| - `./devtools/temp_debug/.hf-cache/datasets` | |
| >[!Tip] | |
| > You may not want to delete these folders. For example, if you are debugging model training instead of data pre-processing, you may NOT want to delete the cache or output folders. You may also need to add additional tasks to the `tasks.json` file depending on your use case. | |
| Below is the [./vscode/tasks.json](../.vscode/tasks.json) file that defines the `cleanup-for-dataprep` task. This task is run before each debugging session when you use the above configuration. Note how there are two tasks that delete the two folders mentioned above. The third task `cleanup-for-dataprep` is a composite task that combines the two tasks. A composite task is necessary because VSCode does not allow you to specify multiple tasks in the `preLaunchTask` argument of the `launch.json` file. | |
| ```jsonc | |
| // .vscode/tasks.json | |
| // this file is used by launch.json | |
| { | |
| "version": "2.0.0", | |
| "tasks": [ | |
| // this task changes into the devtools directory and deletes the temp_debug/axolotl_outputs folder | |
| { | |
| "label": "delete-outputs", | |
| "type": "shell", | |
| "command": "rm -rf temp_debug/axolotl_outputs", | |
| "options":{ "cwd": "${workspaceFolder}/devtools"}, | |
| "problemMatcher": [] | |
| }, | |
| // this task changes into the devtools directory and deletes the `temp_debug/.hf-cache/datasets` folder | |
| { | |
| "label": "delete-temp-hf-dataset-cache", | |
| "type": "shell", | |
| "command": "rm -rf temp_debug/.hf-cache/datasets", | |
| "options":{ "cwd": "${workspaceFolder}/devtools"}, | |
| "problemMatcher": [] | |
| }, | |
| // this task combines the two tasks above | |
| { | |
| "label": "cleanup-for-dataprep", | |
| "dependsOn": ["delete-outputs", "delete-temp-hf-dataset-cache"], | |
| } | |
| ] | |
| } | |
| ``` | |
| ### Customizing your debugger | |
| Your debugging use case may differ from the example above. The easiest thing to do is to put your own axolotl config in the `devtools` folder and modify the `launch.json` file to use your config. You may also want to modify the `preLaunchTask` to delete different folders or not delete anything at all. | |
| ### Video Tutorial | |
| The following video tutorial walks through the above configuration and demonstrates how to debug with VSCode, (click the image below to watch): | |
| <div style="text-align: center; line-height: 0;"> | |
| <a href="https://youtu.be/xUUB11yeMmc" target="_blank" | |
| title="How to debug Axolotl (for fine tuning LLMs)"><img | |
| src="https://i.ytimg.com/vi/xUUB11yeMmc/maxresdefault.jpg" | |
| style="border-radius: 10px; display: block; margin: auto;" width="560" height="315" /></a> | |
| <figcaption style="font-size: smaller;"><a href="https://hamel.dev">Hamel Husain's</a> tutorial: <a href="https://www.youtube.com/watch?v=xUUB11yeMmc">Debugging Axolotl w/VSCode</a></figcaption> | |
| </div> | |
| <br> | |
| ## Debugging With Docker | |
| Using [official Axolotl Docker images](https://hub.docker.com/r/winglian/axolotl/tags) is a great way to debug your code, and is a very popular way to use Axolotl. Attaching VSCode to Docker takes a few more steps. | |
| ### Setup | |
| On the host that is running axolotl (ex: if you are using a remote host), clone the axolotl repo and change your current directory to the root: | |
| ```bash | |
| git clone https://github.com/OpenAccess-AI-Collective/axolotl | |
| cd axolotl | |
| ``` | |
| >[!Tip] | |
| > If you already have axolotl cloned on your host, make sure you have the latest changes and change into the root of the project. | |
| Next, run the desired docker image and mount the current directory. Below is a docker command you can run to do this:[^2] | |
| ```bash | |
| docker run --privileged --gpus '"all"' --shm-size 10g --rm -it --name axolotl --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --mount type=bind,src="${PWD}",target=/workspace/axolotl -v ${HOME}/.cache/huggingface:/root/.cache/huggingface winglian/axolotl:main-py3.10-cu118-2.0.1 | |
| ``` | |
| >[!Tip] | |
| > To understand which containers are available, see the [Docker section of the README](../README.md#docker) and the [DockerHub repo](https://hub.docker.com/r/winglian/axolotl/tags). For details of how the Docker containers are built, see axolotl's [Docker CI builds](../.github/workflows/main.yml). | |
| You will now be in the container. Next, perform an editable install of Axolotl: | |
| ```bash | |
| pip3 install packaging | |
| pip3 install -e '.[flash-attn,deepspeed]' | |
| ``` | |
| ### Attach To Container | |
| Next, if you are using a remote host, [Remote into this host with VSCode](https://code.visualstudio.com/docs/remote/ssh). If you are using a local host, you can skip this step. | |
| Next, select `Dev Containers: Attach to Running Container...` using the command palette (`CMD + SHIFT + P`) in VSCode. You will be prompted to select a container to attach to. Select the container you just created. You will now be in the container with a working directory that is at the root of the project. Any changes you make to the code will be reflected both in the container and on the host. | |
| Now you are ready to debug as described above (see [Debugging with VSCode](#debugging-with-vscode)). | |
| ### Video - Attaching To Docker On Remote Host | |
| Here is a short video that demonstrates how to attach to a Docker container on a remote host: | |
| <div style="text-align: center; line-height: 0;"> | |
| <a href="https://youtu.be/0AuoR7QnHR0" target="_blank" | |
| title="Debugging Axolotl Part 2: Attaching to Docker on a Remote Host"><img | |
| src="https://i.ytimg.com/vi/0AuoR7QnHR0/hqdefault.jpg" | |
| style="border-radius: 10px; display: block; margin: auto;" width="560" height="315" /></a> | |
| <figcaption style="font-size: smaller;"><a href="https://hamel.dev">Hamel Husain's</a> tutorial: <a href="https://youtu.be/0AuoR7QnHR0">Debugging Axolotl Part 2: Attaching to Docker on a Remote Host | |
| </a></figcaption> | |
| </div> | |
| <br> | |
| [^1]: The config actually mimics the command `CUDA_VISIBLE_DEVICES=0 python -m accelerate.commands.launch -m axolotl.cli.train devtools/sharegpt.yml`, but this is the same thing. | |
| [^2]: Many of the below flags are recommended best practices by Nvidia when using nvidia-container-toolkit. You can read more about these flags [here](https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html). | |