File size: 4,615 Bytes
f526676 afc75aa f526676 97a4d7f f526676 52cb069 97a4d7f f526676 97a4d7f f526676 97b1677 f526676 97a4d7f f526676 5a21935 f526676 2d2023b f526676 2d2023b f526676 7bb1a9c f526676 7bb1a9c f526676 7bb1a9c f526676 7bb1a9c f526676 7bb1a9c f526676 a480e27 f526676 f9a2086 147e253 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
---
library_name: transformers
language:
- wo
- en
license: apache-2.0
pipeline_tag: text2text-generation
---
# Oolel: A High-Performing Open LLM for Wolof
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/62e335bbf15e7fce909fe5d4/liiZ1rAkiIgGpgN_jqwq6.mp4"></video>
Despite numerous open-source innovations in large language models, African languages have remained underrepresented.
**Soynade Research** is transforming this landscape with Oolel, the first open-source language model for Wolof.
Built on the **Qwen 2.5** architecture, Oolel combines state-of-the-art AI technology with deep Wolof linguistic expertise. With careful high-quality curated data, we trained and optimized Oolel for the following tasks:
- **RAG** supporting Wolof queries with English, French, or Wolof context.
- **Bidirectional translation between English and Wolof**
- **Natural text generation in Wolof**
- **Math in Wolof**
- **And many other standard NLP tasks**:
- Summarization
- Text edition
- etc
## 3. Usage
**!!! It's important to add your system prompt !!!**
Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(
"soynade-research/Oolel-v0.1",
torch_dtype = torch.bfloat16,
device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("soynade-research/Oolel-v0.1")
def generate_response(messages, max_new_tokens=1024, temperature=0.1):
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=max_new_tokens, temperature=temperature)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
return response
```
**Some tasks examples:**
1. **Translation Tasks**
```python
system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Translate to Wolof: Bassirou Diomaye Faye is the new Senegalese president. He is 44 years old"}
]
print(generate_response(messages))
```
2. **Code generation**
```python
system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Bindal ab klaas Python buy wone ni ñuy jëfandikoo dataframe yi ci Pandas"}
]
print(generate_response(messages))
```
3. **Problem Solving**
```python
system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Ndax nga mën ma won ni ñuy resolver problème bii: Fatou dafa jënd 3 kilo ceeb, 2 kilo diw ak 5 kilo sukër. Ceeb gi wenn kilo 500 CFA la, diw gi 1200 CFA kilo bi, sukër gi 750 CFA kilo bi. Ñaata la wara fay?"}
]
from pprint import pprint
pprint(generate_response(messages))
```
4. **Text Generation** (e.g. story generation)
```python
system_prompt = "You are a skilled Wolof storyteller (Gewël) with deep knowledge of African folktales and traditions. Write engaging stories in Wolof that reflect African cultural values and wisdom."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Bindal ab léeb ci gaynde gi lekk muus mi"}
]
print(generate_response(messages, temperature=0.9))
```
5. **Multi-turn conversations**
Oolel is not optimized for multi-turn conversations, but you can try it!
```bash
messages = [
{"role": "user", "content": "Wax ma clan mooy CEDEAO ? Ci lan la liggeey?"},
{"role": "assistant", "content": "CEDEAO mooy 'organisation' gu boole reew yi nekk ci pennc Afrika bi. Mu ngi sukkandiku ci wàll économie, politig, ak déggoo diggante reew yi"},
{"role": "user", "content": "ñaata reew ñoo ci bokk?"}
]
print(generate_response(messages))
```
## Authors
- [**Yaya SY**](https://x.com/seygalare): NLP Researcher (Efficient Continued Pretraining)
- [**Dioula DOUCOURE**](https://x.com/DioulaD): Data & NLP Engineer |