|
# Finetuning on an unstructured dataset |
|
|
|
While most scripts were made to finetune on instruction datasets, it is possible to finetune on any dataset. This is useful for experimentation while not being as expensive as training a full model. |
|
|
|
This guide is only to prepare the finetuning, as either LoRA or Adapter-v1 methods support this dataset type! |
|
|
|
## Preparation |
|
|
|
1. Gather your text into an input file named `input.txt` |
|
2. Divide the data into training and validation sets using the following script: |
|
|
|
```bash |
|
python scripts/prepare_any_text.py |
|
``` |
|
|
|
3. Modify relevant scripts for your finetuning method under `finetune/` and `evaluate/`, setting the `instruction_tuning` variable to `False` |
|
|
|
And then you're set! Proceed to run the [LoRA guide](./finetune_lora.md) or [Adapter v1 guide](./finetune_adapter.md). |