5 400

Literate Goggles

literate-goggles

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

Unified Reward Model for Multimodal Understanding and Generation

upvoted a paper 8 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

upvoted an article 20 days ago

SigLIP 2: A better multilingual vision language encoder

View all activity

Organizations

None yet

literate-goggles's activity

upvoted a paper 4 days ago

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published 7 days ago • 104

upvoted a paper 8 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published 16 days ago • 69

upvoted an article 20 days ago

Article

SigLIP 2: A better multilingual vision language encoder

21 days ago

• 134

upvoted 2 papers 20 days ago

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published 22 days ago • 179

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

Paper • 2502.05139 • Published Feb 7 • 1

upvoted a paper 21 days ago

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Paper • 2502.13128 • Published 23 days ago • 37

upvoted a paper 22 days ago

Soundwave: Less is More for Speech-Text Alignment in LLMs

Paper • 2502.12900 • Published 24 days ago • 77

upvoted 2 papers 23 days ago

Learning Getting-Up Policies for Real-World Humanoid Robots

Paper • 2502.12152 • Published 24 days ago • 37

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published 26 days ago • 142

upvoted a paper 25 days ago

Region-Adaptive Sampling for Diffusion Transformers

Paper • 2502.10389 • Published 27 days ago • 52

upvoted 3 papers 27 days ago

Language Models Use Trigonometry to Do Addition

Paper • 2502.00873 • Published Feb 2 • 1

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published 29 days ago • 143

Logical Reasoning in Large Language Models: A Survey

Paper • 2502.09100 • Published 29 days ago • 22

upvoted 3 papers 28 days ago

XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model

Paper • 2406.04904 • Published Jun 7, 2024 • 9

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Paper • 2502.05512 • Published Feb 8 • 2

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6 • 25

upvoted 2 articles 29 days ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.16k

Article

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

about 1 month ago

• 49

upvoted 2 papers 29 days ago

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks

Paper • 2502.04465 • Published Feb 6 • 3

Competitive Programming with Large Reasoning Models

Paper • 2502.06807 • Published Feb 3 • 67