File size: 4,225 Bytes
d28328b aba3258 d28328b a909a34 d28328b aba3258 d28328b a909a34 d28328b 23d445b d28328b 6c5bc38 d28328b 6c5bc38 d28328b a1bb277 08670ea 6c5bc38 d28328b 6c5bc38 d28328b 6c5bc38 d28328b f50f6f2 d28328b 6c5bc38 d28328b 6c5bc38 d28328b 6c5bc38 fd1bfba 6c5bc38 d28328b b7df593 d28328b b7df593 d28328b b7df593 d28328b 6c5bc38 d28328b 6c5bc38 d28328b 58ac8db |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
---
library_name: transformers
datasets:
- web_questions
metrics:
- perplexity
---
# Model Card for Model ID
This model card corresponds to the 7B instruct finetuned version of the Gemma model.
## Model Details
This is a general question-answer model finetuned on the web_questions dataset.
### Model Description
This is a general question-answer LLM finetuned using Gemma on top of web_questions dataset.
Gemma is a family of lightweight, state-of-the-art open models from Google,
built from the same research and technology used to create the Gemini models.
They are text-to-text, decoder-only large language models, available in English,
with open weights, pre-trained variants, and instruction-tuned variants. Gemma
models are well-suited for a variety of text generation tasks, including
question answering, summarization, and reasoning. Their relatively small size
makes it possible to deploy them in environments with limited resources such as
a laptop, desktop or your own cloud infrastructure, democratizing access to
state of the art AI models and helping foster innovation for everyone.
- **Developed by:** Geerath Bhat
- **Model type:** Fine-tuned Instruct LLM.
- **Language(s) (NLP):** English
- **License:** No
- **Finetuned from model:** [google/gemma-7b-it]
### Usage
Google/Gemma has shared some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
hf_model_repo = Geerath/google-gemma-7b-it-finetuned-web-questions
# Get the tokenizer
tokenizer = AutoTokenizer.from_pretrained(hf_model_repo)
# Load the model
model = AutoModelForCausalLM.from_pretrained(hf_model_repo,
quantization_config=bnb_config,
device_map="auto")
prompt = ["Question: Tell me something about IISc\n\nAnswer:\n"]
# Generate response
%%time
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids
outputs = model.generate(input_ids=input_ids,
max_new_tokens=200,
do_sample = True,
temperature=0.2)
result = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
result = "Question:"+result.split("Question:")[1]
# Print the result
print(f"Generated response:\n{result}")
#### Fine-tuning the model
You can find fine-tuning scripts and notebook under the [`examples/` directory](https://huggingface.co/google/gemma-7b/tree/main/examples) of [`google/gemma-7b`](https://huggingface.co/google/gemma-7b) repository. To adapt it to this model, simply change the model-id to `google/gemma-7b-it`.
In that repository, we provide:
* A script to perform Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA
* A script to perform SFT using FSDP on TPU devices
* A notebook that you can run on a free-tier Google Colab instance to perform SFT on English quotes dataset
## How to Get Started with the Model
Use the code provided by google/gemma-7b-it to get started with this finetuned model.
## Training Details
### Training Data
web_questions
### Training Procedure
Trained using SFTTrainer and below are the TrainingArguments.
num_train_epochs=1, # adjust based on the data size
per_device_train_batch_size=4, # use 2 or 4 if you have less GPU RAM
per_device_eval_batch_size=4,
optim="paged_adamw_32bit",
#gradient_accumulation_steps=2,
save_strategy="epoch",
evaluation_strategy="epoch",
learning_rate=2e-4,
logging_steps=1,
fp16=True,
weight_decay=0.01,
lr_scheduler_type="cosine",
seed=42,
## Evaluation
Evaluated on test set of the web_questions dataset.
#### Testing Data
Currently tested on test set of web_questions dataset and will update soon the testing results with respect to other datasets. Thank you!!!
#### Metrics
Perplexity
Accuracy
F1 Score
### Results
After 2 epochs the training loss was 1.114500 and validation loss was 1.592121.
Perplexity on test data from web_questions dataset: 5.13 |