-
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 -
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Paper • 2406.18521 • Published • 29 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 36 -
Law of Vision Representation in MLLMs
Paper • 2408.16357 • Published • 93
Collections
Discover the best community collections!
Collections including paper arxiv:2412.14475
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 15 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 88 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 32
-
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Paper • 2312.06585 • Published • 29 -
TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 39 -
SciPhi/AgentSearch-V1
Viewer • Updated • 70k • 4.32k • 86 -
Data Filtering Networks
Paper • 2309.17425 • Published • 6