MaLLaM πŸŒ™ 5B (Malaysia Large Language Model), Pretrain 5B 4096 context length on Malaysian text

Pretrain from scratch 5B parameters using Mistral architecture on 90B Malaysian text tokens.

README at https://github.com/mesolitica/malaya/tree/5.1/pretrained-model/mistral

WandB, https://wandb.ai/mesolitica/pretrain-mistral-5b?workspace=user-husein-mesolitica

WandB report, https://wandb.ai/mesolitica/pretrain-mistral-3b/reports/Pretrain-Larger-Malaysian-Mistral--Vmlldzo2MDkyOTgz

Technical report, https://github.com/mesolitica/malaya/wiki/MaLLaM-%F0%9F%8C%99-Malaysia-Large-Language-Model

how-to

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

TORCH_DTYPE = 'bfloat16'
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=getattr(torch, TORCH_DTYPE)
)

tokenizer = AutoTokenizer.from_pretrained('mesolitica/mallam-5B-4096')
model = AutoModelForCausalLM.from_pretrained(
    'mesolitica/mallam-5B-4096',
    use_flash_attention_2 = True,
    quantization_config = nf4_config
)
prompt = '<s>nama saya'
inputs = tokenizer([prompt], return_tensors='pt', add_special_tokens=False).to('cuda')

generate_kwargs = dict(
    inputs,
    max_new_tokens=512,
    top_p=0.95,
    top_k=50,
    temperature=0.9,
    do_sample=True,
    num_beams=1,
    repetition_penalty=1.05,
)
r = model.generate(**generate_kwargs)
Downloads last month
85
Safetensors
Model size
5B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Space using mesolitica/mallam-5B-4096 1

Collection including mesolitica/mallam-5B-4096