Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper
•
2503.07572
•
Published
•
34
Ahora la comunidad se llama Somos NLP ➡️ https://huggingface.co/somosnlp
Llama-3-8B-instruct
) to generate synthetic instructions and then fine-tune the base version (Llama-3-8B
) on this dataset, you can improve even the it-tuned versionollama
models (initially phi and llama3) automatically and upload it to the Hugging Face Hub!