Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2407.14358

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 10
mozilla-foundation/common_voice_17_0

Viewer • Updated Jun 16, 2024 • 13M • 36.3k • 231
Stable Audio Open

Paper • 2407.14358 • Published Jul 19, 2024 • 26
fnlp/AnyGPT-chat

Text Generation • Updated Jun 5, 2024 • 34 • 17

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Paper • 2406.11768 • Published Jun 17, 2024 • 20
Investigating Decoder-only Large Language Models for Speech-to-text Translation

Paper • 2407.03169 • Published Jul 3, 2024 • 11
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3, 2024 • 20
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Paper • 2407.04051 • Published Jul 4, 2024 • 39

Papers I want to read

Papers in my to-read list

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 68
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 88

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 21

A Novel 1D State Space for Efficient Music Rhythmic Analysis

Paper • 2111.00704 • Published Nov 1, 2021
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 55
Music Style Transfer with Time-Varying Inversion of Diffusion Models

Paper • 2402.13763 • Published Feb 21, 2024 • 11
ChatMusician: Understanding and Generating Music Intrinsically with LLM

Paper • 2402.16153 • Published Feb 25, 2024 • 60

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Paper • 2402.07383 • Published Feb 12, 2024 • 16
Matcha-TTS: A fast TTS architecture with conditional flow matching

Paper • 2309.03199 • Published Sep 6, 2023 • 12
Natural language guidance of high-fidelity text-to-speech with synthetic annotations

Paper • 2402.01912 • Published Feb 2, 2024 • 12
Fast Timing-Conditioned Latent Audio Diffusion

Paper • 2402.04825 • Published Feb 7, 2024 • 8

NExT-GPT: Any-to-Any Multimodal LLM

Paper • 2309.05519 • Published Sep 11, 2023 • 78
Large Language Model for Science: A Study on P vs. NP

Paper • 2309.05689 • Published Sep 11, 2023 • 21
AstroLLaMA: Towards Specialized Foundation Models in Astronomy

Paper • 2309.06126 • Published Sep 12, 2023 • 17
Large Language Models for Compiler Optimization

Paper • 2309.07062 • Published Sep 11, 2023 • 23

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs