-
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
mozilla-foundation/common_voice_17_0
Viewer • Updated • 13M • 36.3k • 231 -
Stable Audio Open
Paper • 2407.14358 • Published • 26 -
fnlp/AnyGPT-chat
Text Generation • Updated • 34 • 17
Collections
Discover the best community collections!
Collections including paper arxiv:2407.14358
-
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Paper • 2406.11768 • Published • 20 -
Investigating Decoder-only Large Language Models for Speech-to-text Translation
Paper • 2407.03169 • Published • 11 -
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Paper • 2407.02869 • Published • 20 -
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Paper • 2407.04051 • Published • 39
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 68 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 88
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Paper • 2405.18503 • Published • 9 -
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Paper • 2405.20289 • Published • 11 -
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Paper • 2406.02897 • Published • 16 -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Paper • 2406.03344 • Published • 21
-
A Novel 1D State Space for Efficient Music Rhythmic Analysis
Paper • 2111.00704 • Published -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 55 -
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Paper • 2402.13763 • Published • 11 -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 60
-
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Paper • 2402.07383 • Published • 16 -
Matcha-TTS: A fast TTS architecture with conditional flow matching
Paper • 2309.03199 • Published • 12 -
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Paper • 2402.01912 • Published • 12 -
Fast Timing-Conditioned Latent Audio Diffusion
Paper • 2402.04825 • Published • 8
-
NExT-GPT: Any-to-Any Multimodal LLM
Paper • 2309.05519 • Published • 78 -
Large Language Model for Science: A Study on P vs. NP
Paper • 2309.05689 • Published • 21 -
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Paper • 2309.06126 • Published • 17 -
Large Language Models for Compiler Optimization
Paper • 2309.07062 • Published • 23