3 76 79

Shyam Sunder Kumar

theainerd

AI & ML interests

Natural Language Processing

Recent Activity

upvoted an article about 7 hours ago

Open R1: Update #2

upvoted a paper 4 days ago

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

upvoted an article 8 days ago

Welcome to Inference Providers on the Hub 🔥

View all activity

Organizations

theainerd's activity

upvoted an article about 7 hours ago

Article

Open R1: Update #2

and 6 others •

1 day ago

• 117

upvoted a paper 4 days ago

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

Paper • 2502.03544 • Published 6 days ago • 37

upvoted an article 8 days ago

Article

Welcome to Inference Providers on the Hub 🔥

15 days ago

• 322

reacted to chansung's post with 👍 9 days ago

Post

2770

Simple Paper Review #5

I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University

The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.

The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.

I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.

https://arxiv.org/abs/2501.17161

upvoted an article 10 days ago

Article

Open-R1: Update #1

and 7 others •

10 days ago

• 268

upvoted a collection 11 days ago

🧠 Reasoning datasets

Collection

Datasets with reasoning traces for math and code released by the community • 11 items • Updated about 16 hours ago • 48

updated a collection 12 days ago

Papers-to-Read

Collection

5 items • Updated 12 days ago

commented a paper 12 days ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published 12 days ago • 51 •

upvoted a paper 12 days ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published 12 days ago • 51

liked a model 12 days ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 10 days ago • 364k • 3.04k

liked a Space 14 days ago

279

MMS

🌍

Transform and identify speech with MMS

upvoted an article 15 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

15 days ago

• 706

reacted to fdaudens's post with ❤️ 15 days ago

Post

8306

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

- Original release: 8 models, 540K downloads. Just the beginning...

- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals.

The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.

When you empower builders, innovation explodes. For everyone. 🚀

The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.