ValueFX9507/Tifa-Deepsex-14b-CoT-GGUF-Q4 Reinforcement Learning • Updated about 9 hours ago • 43.9k • 443
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 512
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 14 days ago • 102
OLMo 2 Preview Post-trained Models Collection These model's tokenizer did not use HF's fast tokenizer, resulting in variations in how pre-tokenization was applied. Resolved in latest versions. • 6 items • Updated 1 day ago • 2
OnDeviceMedNotes/synthetic-medical-conversations-deepseek-v3 Viewer • Updated 14 days ago • 143k • 427 • 32