Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.17146

MIT Talk 31/10 Papers

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17, 2024 • 73
BRAVE: Broadening the visual encoding of vision-language models

Paper • 2404.07204 • Published Apr 10, 2024 • 19
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27, 2024 • 47
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 108

Artifacts for open multimodal language models.

about 9 hours ago

allenai/Molmo-72B-0924

Image-Text-to-Text • Updated Oct 10, 2024 • 4.29k • 282
allenai/Molmo-7B-D-0924

Image-Text-to-Text • Updated Oct 10, 2024 • 99k • 514
allenai/Molmo-7B-O-0924

Image-Text-to-Text • Updated Nov 15, 2024 • 17.3k • 155
allenai/MolmoE-1B-0924

Image-Text-to-Text • Updated Oct 10, 2024 • 60.3k • 138

about 22 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 26
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 43
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 22

Running on CPU Upgrade

2.03k

2.03k

Anychat

🏢

Display code snippets for different chat providers
Running

267

267

Qwen2.5 Coder Artifacts

🐢

Generate application code with Qwen2.5-Coder-32B
Running

906

906

QwQ-32B-Preview

🔍

QwQ-32B-Preview
Running on CPU Upgrade

12.7k

12.7k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

about 11 hours ago

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

Paper • 2410.16153 • Published Oct 21, 2024 • 44
AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published Oct 21, 2024 • 59
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Paper • 2410.12787 • Published Oct 16, 2024 • 31
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Paper • 2410.01744 • Published Oct 2, 2024 • 26

Interesting Papers

These papers are interesting (to me)

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Paper • 2410.02740 • Published Oct 3, 2024 • 52
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Paper • 2410.01215 • Published Oct 2, 2024 • 32
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 108
EuroLLM: Multilingual Language Models for Europe

Paper • 2409.16235 • Published Sep 24, 2024 • 26

OPen data for VLM

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 108

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 108

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 108

vision language models

papers and models 🙈

about 13 hours ago

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 108
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 76
mistralai/Pixtral-12B-2409

Image-Text-to-Text • Updated Dec 26, 2024 • • 622
HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated 7 days ago • 66.5k • 408

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs