![](https://cdn-avatars.huggingface.co/v1/production/uploads/6538815d1bdb3c40db94fbfa/xMBly9PUMphrFVMxLX4kq.png)
2025 January
- Any-to-Any β’ Updated β’ 381k β’ 2.87k
deepseek-ai/Janus-Pro-1B
Any-to-Any β’ Updated β’ 90.5k β’ 351tencent/Hunyuan3D-2
Image-to-3D β’ Updated β’ 50.4k β’ 872tencent/Hunyuan-7B-Instruct
Text Generation β’ Updated β’ 343 β’ 40
ByteDance/Sa2VA-4B
Image-Text-to-Text β’ Updated β’ 4k β’ 62Note A unified model for dense grounded understanding of images & videos.
bytedance-research/UI-TARS-72B-DPO
Image-Text-to-Text β’ Updated β’ 12.5k β’ 85
deepseek-ai/DeepSeek-R1
Text Generation β’ Updated β’ 2.94M β’ β’ 8.35kNote 660B reasoning models with MIT license
deepseek-ai/DeepSeek-R1-Zero
Text Generation β’ Updated β’ 28.1k β’ 775
MiniMaxAI/MiniMax-VL-01
Image-Text-to-Text β’ Updated β’ 2.31k β’ 234Note A non transformer based ( ViT-MLP-LLM framework) VLM
MiniMaxAI/MiniMax-Text-01
Text Generation β’ Updated β’ 6.8k β’ 513Note 456B LLM with 1M tokens training context
Qwen/Qwen2.5-Math-PRM-7B
Text Classification β’ Updated β’ 15k β’ 51Note Math model
Qwen/Qwen2.5-14B-Instruct-1M
Text Generation β’ Updated β’ 18.2k β’ 236
openbmb/MiniCPM-o-2_6
Any-to-Any β’ Updated β’ 460k β’ 938Note End-side multimodal LLM that supports real time conversation and video understanding.
ICTNLP/llava-mini-llama-3.1-8b
Image-Text-to-Text β’ Updated β’ 6.15k β’ 46
BlinkDL/rwkv-7-world
Text Generation β’ Updated β’ 67Note RNN+Transfomers
HKUSTAudio/Llasa-3B
Text-to-Speech β’ Updated β’ 7.88k β’ 427Note TTS
DAMO-NLP-SG/VideoLLaMA3-7B
Visual Question Answering β’ Updated β’ 5.73k β’ 32internlm/internlm3-8b-instruct
Text Generation β’ Updated β’ 36.2k β’ 195
baichuan-inc/Baichuan-M1-14B-Base
Updated β’ 212 β’ 18Note Medical LLM
opencsg/Fineweb-Edu-Chinese-V2.1
Preview β’ Updated β’ 20k β’ 11Note Dataset designed specifically for natural language processing (NLP) tasks in the education sector.
DAMO-NLP-SG/multimodal_textbook
Updated β’ 14.1k β’ 132Note A multimodel dataset for vision language pretraining , includes 6.5M images + 0.8B text from 22k hours of instructional videos
hithink-ai/MME-Finance
Viewer β’ Updated β’ 402 β’ 96 β’ 8KwaiVGI/GameFactory-Dataset
Updated β’ 192 β’ 9m-a-p/YuE-s1-7B-anneal-zh-cot
Text Generation β’ Updated β’ 1.08k β’ 27m-a-p/YuE-s1-7B-anneal-jp-kr-cot
Text Generation β’ Updated β’ 1.56k β’ 15m-a-p/YuE-s1-7B-anneal-en-cot
Text Generation β’ Updated β’ 38.6k β’ 360Qwen/Qwen2.5-VL-3B-Instruct
Image-Text-to-Text β’ Updated β’ 182k β’ 184Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text β’ Updated β’ 722k β’ 383- 1.5k
Hunyuan3D-2.0
πText-to-3D and Image-to-3D Generation
- 37
UI-TARS
πSelect coordinates on an image based on instructions
- 48
MiniMaxVL01
π¬Generate responses using text and images
- 1.68k
Chat With Janus-Pro-7B
πA unified multimodal understanding and generation model.
- 504
Qwen2.5 Max Demo
π’Send messages for chatbot responses