Model Info

xLSTM Trained on a shuffeld wikimedia/wikipedia 20231101.en dataset (seed=42)

Model checkpoints as branches

per_device_train_batch_size=32,
logging_steps=3650,
gradient_accumulation_steps=8,
num_train_epochs=1,
weight_decay=0.1,
warmup_steps=1_000,
lr_scheduler_type="cosine",
learning_rate=5e-4,
save_steps=3650,
fp16=True,

How to use

Install:

pip install xlstm
pip install mlstm_kernels
pip install 'transformers @ git+https://[email protected]/NX-AI/transformers.git@integrate_xlstm_clean'
from transformers import AutoModelForCausalLM, AutoTokenizer

xlstm = AutoModelForCausalLM.from_pretrained("J4bb4wukis/xlstm_247m_wikipedia_en_shuffeld")
tokenizer = AutoTokenizer.from_pretrained("J4bb4wukis/xlstm_247m_wikipedia_en_shuffeld")

prompts = "Angela Merkel is"
inputs = tokenizer(prompts,return_tensors='pt').input_ids
outputs = xlstm.generate(inputs, max_new_tokens=100, do_sample=True, top_k=10, top_p=0.95)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
Downloads last month
26
Safetensors
Model size
247M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Dataset used to train J4bb4wukis/xlstm_247m_wikipedia_en_shuffeld