-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 68 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 88
Collections
Discover the best community collections!
Collections including paper arxiv:2407.10759
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 29 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 38
-
parler-tts/parler_tts_mini_v0.1
Text-to-Speech • Updated • 9.73k • 349 -
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
Paper • 2405.08317 • Published • 13 -
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Paper • 2405.18669 • Published • 12 -
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper • 2406.02430 • Published • 34
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 126 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 52 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 14 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 66
-
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Paper • 2310.00704 • Published • 21 -
Structural Similarities Between Language Models and Neural Response Measurements
Paper • 2306.01930 • Published • 2 -
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Paper • 2006.14941 • Published • 2 -
NU-GAN: High resolution neural upsampling with GAN
Paper • 2010.11362 • Published • 2
-
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 63 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 45 -
Qwen Technical Report
Paper • 2309.16609 • Published • 35 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 48
-
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Paper • 2402.07383 • Published • 16 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 12 -
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Paper • 2402.01912 • Published • 12 -
Fast Timing-Conditioned Latent Audio Diffusion
Paper • 2402.04825 • Published • 8