Submitted by akhaliq 60 BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data · 19 authors 9
Submitted by akhaliq 38 World Model on Million-Length Video And Language With RingAttention · 4 authors 5
Submitted by akhaliq 26 Lumos : Empowering Multimodal LLMs with Scene Text Recognition · 14 authors 2
Submitted by akhaliq 15 Graph Mamba: Towards Learning on Graphs with State Space Models · 2 authors 1
Submitted by akhaliq 13 IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation · 7 authors 1
Submitted by akhaliq 11 ChatCell: Facilitating Single-Cell Analysis with Natural Language · 10 authors 4
Submitted by akhaliq 9 Vision-Based Hand Gesture Customization from a Single Demonstration · 8 authors 1
Submitted by akhaliq 5 NeRF Analogies: Example-Based Visual Attribute Transfer for NeRFs · 7 authors 1