Bronsn commited on
Commit
da2080a
·
verified ·
1 Parent(s): a6355d0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - lug
5
+ tags:
6
+ - llama-3.1
7
+ - gemma-2b
8
+ - finetuned
9
+ - english-luganda
10
+ - translation
11
+ - peft
12
+ - qlora
13
+ ---
14
+
15
+ # final_model_8b_64
16
+
17
+ This model is finetuned for English-Luganda bidirectional translation tasks. It's trained using QLoRA (Quantized Low-Rank Adaptation) on the original LLaMA-3.1-8B model.
18
+
19
+ ## Model Details
20
+
21
+ ### Base Model Information
22
+ - Base model: unsloth/Meta-Llama-3.1-8B
23
+ - Model family: LLaMA-3.1-8B
24
+ - Type: Base
25
+ - Original model size: 8B parameters
26
+
27
+ ### Training Configuration
28
+ - Training method: QLoRA (4-bit quantization)
29
+ - LoRA rank (r): 64
30
+ - LoRA alpha: 64
31
+ - Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
32
+ - LoRA dropout: 0
33
+ - Learning rate: 2e-5
34
+ - Batch size: 2
35
+ - Gradient accumulation steps: 4
36
+ - Max sequence length: 2048
37
+ - Weight decay: 0.01
38
+ - Training steps: 100,000
39
+ - Warmup steps: 1000
40
+ - Save interval: 10,000 steps
41
+ - Optimizer: AdamW (8-bit)
42
+ - LR scheduler: Cosine
43
+ - Mixed precision: bf16
44
+ - Gradient checkpointing: Enabled (unsloth)
45
+
46
+ ### Dataset Information
47
+ - Training data: Parallel English-Luganda corpus
48
+ - Data sources:
49
+ - SALT dataset (salt-train-v1.4)
50
+ - Extracted parallel sentences
51
+ - Synthetic code-mixed data
52
+ - Bidirectional translation: Trained on both English→Luganda and Luganda→English
53
+ - Total training examples: Varies by direction
54
+
55
+ ### Usage
56
+ This model uses an instruction-based prompt format:
57
+ ```
58
+ Below is an instruction that describes a task,
59
+ paired with an input that provides further context.
60
+ Write a response that appropriately completes the request.
61
+
62
+ ### Instruction:
63
+ Translate the following text to [target_lang]
64
+
65
+ ### Input:
66
+ [input text]
67
+
68
+ ### Response:
69
+ [translation]
70
+ ```
71
+
72
+ ## Training Infrastructure
73
+ - Trained using unsloth optimization library
74
+ - Hardware: Single A100 GPU
75
+ - Quantization: 4-bit training enabled
76
+
77
+ ## Limitations
78
+ - The model is specialized for English-Luganda translation
79
+ - Performance may vary based on domain and complexity of text
80
+ - Limited to the context length of 64 tokens
81
+
82
+ ## Citation and Contact
83
+ If you use this model, please cite:
84
+ - Original LLaMA-3.1 model by Meta AI
85
+ - QLoRA paper: Dettmers et al. (2023)
86
+ - unsloth optimization library