Bronsn
/

ganda_llama_8b_64

+---
+language:
+- en
+- lug
+tags:
+- llama-3.1
+- gemma-2b
+- finetuned
+- english-luganda
+- translation
+- peft
+- qlora
+---
+# final_model_8b_64
+This model is finetuned for English-Luganda bidirectional translation tasks. It's trained using QLoRA (Quantized Low-Rank Adaptation) on the original LLaMA-3.1-8B model.
+## Model Details
+### Base Model Information
+- Base model: unsloth/Meta-Llama-3.1-8B
+- Model family: LLaMA-3.1-8B
+- Type: Base
+- Original model size: 8B parameters
+### Training Configuration
+- Training method: QLoRA (4-bit quantization)
+- LoRA rank (r): 64
+- LoRA alpha: 64
+- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
+- LoRA dropout: 0
+- Learning rate: 2e-5
+- Batch size: 2
+- Gradient accumulation steps: 4
+- Max sequence length: 2048
+- Weight decay: 0.01
+- Training steps: 100,000
+- Warmup steps: 1000
+- Save interval: 10,000 steps
+- Optimizer: AdamW (8-bit)
+- LR scheduler: Cosine
+- Mixed precision: bf16
+- Gradient checkpointing: Enabled (unsloth)
+### Dataset Information
+- Training data: Parallel English-Luganda corpus
+- Data sources:
+  - SALT dataset (salt-train-v1.4)
+  - Extracted parallel sentences
+  - Synthetic code-mixed data
+- Bidirectional translation: Trained on both English→Luganda and Luganda→English
+- Total training examples: Varies by direction
+### Usage
+This model uses an instruction-based prompt format:
+```
+Below is an instruction that describes a task,
+paired with an input that provides further context.
+Write a response that appropriately completes the request.
+### Instruction:
+Translate the following text to [target_lang]
+### Input:
+[input text]
+### Response:
+[translation]
+```
+## Training Infrastructure
+- Trained using unsloth optimization library
+- Hardware: Single A100 GPU
+- Quantization: 4-bit training enabled
+## Limitations
+- The model is specialized for English-Luganda translation
+- Performance may vary based on domain and complexity of text
+- Limited to the context length of 64 tokens
+## Citation and Contact
+If you use this model, please cite:
+- Original LLaMA-3.1 model by Meta AI
+- QLoRA paper: Dettmers et al. (2023)
+- unsloth optimization library