view article Article From Llasa to Llasagna π: Finetuning LLaSA to generates Italian speech and other languages By Steveeeeeeen and 1 other β’ 3 days ago β’ 19
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi β’ 13 items β’ Updated Sep 18, 2024 β’ 227
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper β’ 2402.13753 β’ Published Feb 21, 2024 β’ 115
ChatAnything: Facetime Chat with LLM-Enhanced Personas Paper β’ 2311.06772 β’ Published Nov 12, 2023 β’ 35
Music ControlNet: Multiple Time-varying Controls for Music Generation Paper β’ 2311.07069 β’ Published Nov 13, 2023 β’ 44
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper β’ 2311.06783 β’ Published Nov 12, 2023 β’ 27
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Paper β’ 2311.04145 β’ Published Nov 7, 2023 β’ 33
Learning From Mistakes Makes LLM Better Reasoner Paper β’ 2310.20689 β’ Published Oct 31, 2023 β’ 29
CapsFusion: Rethinking Image-Text Data at Scale Paper β’ 2310.20550 β’ Published Oct 31, 2023 β’ 26
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks Paper β’ 2310.19909 β’ Published Oct 30, 2023 β’ 21
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation Paper β’ 2310.19512 β’ Published Oct 30, 2023 β’ 16
MM-VID: Advancing Video Understanding with GPT-4V(ision) Paper β’ 2310.19773 β’ Published Oct 30, 2023 β’ 20
CodeFusion: A Pre-trained Diffusion Model for Code Generation Paper β’ 2310.17680 β’ Published Oct 26, 2023 β’ 69