Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.14700

Papers - University - National University of Singapore

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 12
Contrastive Chain-of-Thought Prompting

Paper • 2311.09277 • Published Nov 15, 2023 • 36
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Paper • 2406.06911 • Published Jun 11, 2024 • 12

Papers - Audio - Text to Speech

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4, 2024 • 10
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 10
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32

Papers - University - Hong Kong University of Science and Te

Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

Paper • 2404.02731 • Published Apr 3, 2024 • 1
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Paper • 2309.12284 • Published Sep 21, 2023 • 18
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4, 2024 • 10
Adapting LLaMA Decoder to Vision Transformer

Paper • 2404.06773 • Published Apr 10, 2024 • 18

Papers - Audio - Mel Spectogram

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Paper • 1712.05884 • Published Dec 16, 2017 • 3
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 12
Music Consistency Models

Paper • 2404.13358 • Published Apr 20, 2024 • 13
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32

Papers - Audio - Training

A Multimodal Approach to Device-Directed Speech Detection with Large Language Models

Paper • 2403.14438 • Published Mar 21, 2024 • 2
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Paper • 2403.17694 • Published Mar 26, 2024 • 12
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations

Paper • 2308.11466 • Published Aug 22, 2023 • 1

Papers - Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 81
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23, 2024 • 21
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Paper • 2402.07383 • Published Feb 12, 2024 • 16
Matcha-TTS: A fast TTS architecture with conditional flow matching

Paper • 2309.03199 • Published Sep 6, 2023 • 12
Natural language guidance of high-fidelity text-to-speech with synthetic annotations

Paper • 2402.01912 • Published Feb 2, 2024 • 12
Fast Timing-Conditioned Latent Audio Diffusion

Paper • 2402.04825 • Published Feb 7, 2024 • 8

DocGraphLM: Documental Graph Language Model for Information Extraction

Paper • 2401.02823 • Published Jan 5, 2024 • 36
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Paper • 2403.02677 • Published Mar 5, 2024 • 18
FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32
TextGrad: Automatic "Differentiation" via Text

Paper • 2406.07496 • Published Jun 11, 2024 • 31

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Paper • 2312.13964 • Published Dec 21, 2023 • 20
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 259
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Paper • 2312.12491 • Published Dec 19, 2023 • 70
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model

Paper • 2401.02330 • Published Jan 4, 2024 • 17

Papers related to audio and music

Music ControlNet: Multiple Time-varying Controls for Music Generation

Paper • 2311.07069 • Published Nov 13, 2023 • 44
FLAP: Fast Language-Audio Pre-training

Paper • 2311.01615 • Published Nov 2, 2023 • 18
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models

Paper • 2310.11954 • Published Oct 18, 2023 • 25
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

Paper • 2308.01546 • Published Aug 3, 2023 • 18

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs