SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity Paper • 2503.01506 • Published 10 days ago • 9
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion Paper • 2503.01183 • Published 10 days ago • 26
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 10 days ago • 72
Training Consistency Models with Variational Noise Coupling Paper • 2502.18197 • Published 16 days ago • 6
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Paper • 2502.20126 • Published 14 days ago • 20
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published 14 days ago • 29
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 21 days ago • 129
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation Paper • 2502.09838 • Published 28 days ago • 10
You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published 28 days ago • 34
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 25 days ago • 142
Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding Paper • 2501.17578 • Published Jan 29 • 1
iFormer: Integrating ConvNet and Transformer for Mobile Application Paper • 2501.15369 • Published Jan 26 • 12
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine Paper • 2408.02900 • Published Aug 6, 2024 • 28
The Geometry of Tokens in Internal Representations of Large Language Models Paper • 2501.10573 • Published Jan 17 • 9