Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- lug
|
5 |
+
tags:
|
6 |
+
- llama-3.1
|
7 |
+
- gemma-2b
|
8 |
+
- finetuned
|
9 |
+
- english-luganda
|
10 |
+
- translation
|
11 |
+
- peft
|
12 |
+
- qlora
|
13 |
+
---
|
14 |
+
|
15 |
+
# final_model_8b_64
|
16 |
+
|
17 |
+
This model is finetuned for English-Luganda bidirectional translation tasks. It's trained using QLoRA (Quantized Low-Rank Adaptation) on the original LLaMA-3.1-8B model.
|
18 |
+
|
19 |
+
## Model Details
|
20 |
+
|
21 |
+
### Base Model Information
|
22 |
+
- Base model: unsloth/Meta-Llama-3.1-8B
|
23 |
+
- Model family: LLaMA-3.1-8B
|
24 |
+
- Type: Base
|
25 |
+
- Original model size: 8B parameters
|
26 |
+
|
27 |
+
### Training Configuration
|
28 |
+
- Training method: QLoRA (4-bit quantization)
|
29 |
+
- LoRA rank (r): 64
|
30 |
+
- LoRA alpha: 64
|
31 |
+
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
|
32 |
+
- LoRA dropout: 0
|
33 |
+
- Learning rate: 2e-5
|
34 |
+
- Batch size: 2
|
35 |
+
- Gradient accumulation steps: 4
|
36 |
+
- Max sequence length: 2048
|
37 |
+
- Weight decay: 0.01
|
38 |
+
- Training steps: 100,000
|
39 |
+
- Warmup steps: 1000
|
40 |
+
- Save interval: 10,000 steps
|
41 |
+
- Optimizer: AdamW (8-bit)
|
42 |
+
- LR scheduler: Cosine
|
43 |
+
- Mixed precision: bf16
|
44 |
+
- Gradient checkpointing: Enabled (unsloth)
|
45 |
+
|
46 |
+
### Dataset Information
|
47 |
+
- Training data: Parallel English-Luganda corpus
|
48 |
+
- Data sources:
|
49 |
+
- SALT dataset (salt-train-v1.4)
|
50 |
+
- Extracted parallel sentences
|
51 |
+
- Synthetic code-mixed data
|
52 |
+
- Bidirectional translation: Trained on both English→Luganda and Luganda→English
|
53 |
+
- Total training examples: Varies by direction
|
54 |
+
|
55 |
+
### Usage
|
56 |
+
This model uses an instruction-based prompt format:
|
57 |
+
```
|
58 |
+
Below is an instruction that describes a task,
|
59 |
+
paired with an input that provides further context.
|
60 |
+
Write a response that appropriately completes the request.
|
61 |
+
|
62 |
+
### Instruction:
|
63 |
+
Translate the following text to [target_lang]
|
64 |
+
|
65 |
+
### Input:
|
66 |
+
[input text]
|
67 |
+
|
68 |
+
### Response:
|
69 |
+
[translation]
|
70 |
+
```
|
71 |
+
|
72 |
+
## Training Infrastructure
|
73 |
+
- Trained using unsloth optimization library
|
74 |
+
- Hardware: Single A100 GPU
|
75 |
+
- Quantization: 4-bit training enabled
|
76 |
+
|
77 |
+
## Limitations
|
78 |
+
- The model is specialized for English-Luganda translation
|
79 |
+
- Performance may vary based on domain and complexity of text
|
80 |
+
- Limited to the context length of 64 tokens
|
81 |
+
|
82 |
+
## Citation and Contact
|
83 |
+
If you use this model, please cite:
|
84 |
+
- Original LLaMA-3.1 model by Meta AI
|
85 |
+
- QLoRA paper: Dettmers et al. (2023)
|
86 |
+
- unsloth optimization library
|