File size: 5,411 Bytes
2f00024 5074ae7 fb05949 e637501 fb05949 2f00024 fb05949 2f00024 fb05949 2f00024 fb05949 2f00024 fb05949 4cc67ef fb05949 4cc67ef fb05949 2f00024 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
---
license: apache-2.0
base_model: google/flan-t5-base
tags:
- generated_from_trainer
metrics:
- rouge
model-index:
- name: flan-t5-base-openbsd-faq
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# flan-t5-base-openbsd-faq
This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) fintuned on [ajsbsd/openbsd-faq](https://huggingface.co/datasets/ajsbsd/openbsd-faq)
These are questions from https://www.openbsd.org/faq/faq1.html for use on [ajsbsd.net](https://ajsbsd.net)
It achieves the following results on the evaluation set:
- Loss: 2.2385
- Rouge1: 0.3935
- Rouge2: 0.3383
- Rougel: 0.3906
- Rougelsum: 0.3844
## Model description
This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base)
## Intended uses & limitations
OpenBSD Q/A chat-bot.
## Training and evaluation data
Questions created from https://www.openbsd.org/faq/faq1.html in Q/A format for text2text generation.
## Training procedure
Trained at Google Colab with the following code.
```
!pip install -q transformers[torch] tokenizers datasets evaluate rouge_score sentencepiece huggingface_hub --upgrade
from huggingface_hub import notebook_login
notebook_login()
import nltk
from datasets import load_dataset
import evaluate
import numpy as np
from transformers import T5Tokenizer, DataCollatorForSeq2Seq
from transformers import T5ForConditionalGeneration, Seq2SeqTrainingArguments, Seq2SeqTrainer
# Load and split the dataset
dataset = load_dataset("ajsbsd/openbsd-faq")
dataset = dataset["train"].train_test_split(test_size=0.2)
#dataset = load_dataset("csv", data_files="./JEOPARDY_CSV.csv")
#dataset = dataset["train"].train_test_split(test_size=0.2)
# Load the tokenizer, model, and data collator
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)
# We prefix our tasks with "answer the question"
prefix = "Please answer this question: "
# Define our preprocessing function
def preprocess_function(examples):
"""Add prefix to the sentences, tokenize the text, and set the labels"""
# The "inputs" are the tokenized answer:
inputs = [prefix + doc for doc in examples["question"]]
model_inputs = tokenizer(inputs, max_length=128, truncation=True)
# The "labels" are the tokenized outputs:
labels = tokenizer(text_target=examples["answer"], max_length=512, truncation=True)
model_inputs["labels"] = labels["input_ids"]
return model_inputs
# Map the preprocessing function across our dataset
tokenized_dataset = dataset.map(preprocess_function, batched=True)
# Set up Rouge score for evaluation
nltk.download("punkt", quiet=True)
metric = evaluate.load("rouge")
def compute_metrics(eval_preds):
preds, labels = eval_preds
# decode preds and labels
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
# rougeLSum expects newline after each sentence
decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
decoded_labels = ["\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels]
result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
return result
# Set up training arguments
training_args = Seq2SeqTrainingArguments(
output_dir="./flan-t5-base-openbsd-faq",
evaluation_strategy="epoch",
learning_rate=3e-4,
per_device_train_batch_size=8,
per_device_eval_batch_size=4,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=5,
predict_with_generate=True,
push_to_hub=False
)
# Set up trainer
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset["train"],
eval_dataset=tokenized_dataset["test"],
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics
)
# Train the model
trainer.train()
trainer.push_to_hub()
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
|:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|
| No log | 1.0 | 9 | 2.2184 | 0.3985 | 0.3308 | 0.3878 | 0.3902 |
| No log | 2.0 | 18 | 2.2060 | 0.4044 | 0.3231 | 0.3959 | 0.3937 |
| No log | 3.0 | 27 | 2.2271 | 0.4063 | 0.3315 | 0.4006 | 0.3971 |
| No log | 4.0 | 36 | 2.2251 | 0.4069 | 0.3366 | 0.4001 | 0.3937 |
| No log | 5.0 | 45 | 2.2385 | 0.3935 | 0.3383 | 0.3906 | 0.3844 |
### Framework versions
- Transformers 4.35.2
- Pytorch 2.1.0+cu118
- Datasets 2.14.7
- Tokenizers 0.15.0
|