Collections
Discover the best community collections!
Collections including paper arxiv:2312.08723
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 12 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 48
-
Music ControlNet: Multiple Time-varying Controls for Music Generation
Paper • 2311.07069 • Published • 44 -
FLAP: Fast Language-Audio Pre-training
Paper • 2311.01615 • Published • 18 -
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models
Paper • 2310.11954 • Published • 25 -
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
Paper • 2308.01546 • Published • 18
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper • 2309.05519 • Published • 78 -
Large Language Model for Science: A Study on P vs. NP
Paper • 2309.05689 • Published • 21 -
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper • 2309.06126 • Published • 17 -
Large Language Models for Compiler Optimization
Paper • 2309.07062 • Published • 23
-
Retrieval-Augmented Text-to-Audio Generation
Paper • 2309.08051 • Published • 7 -
A Large-scale Dataset for Audio-Language Representation Learning
Paper • 2309.11500 • Published • 10 -
End-to-End Speech Recognition Contextualization with Large Language Models
Paper • 2309.10917 • Published • 10 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 9