Arunkumar Venkataramanan's picture

In a Training Loop 🔄

139 288

Arunkumar Venkataramanan

ArunkumarVR

·

https://arunkumarramanan.github.io

AI & ML interests

AGI Research: Reasoning, Safety & Alignment (Superalignment), Generative AI (GenAI), Multi-Modal Foundation Models (FMs), Large Language Models (LLMs), Transformers & Diffusion Models, Open LLM Training, Optimization & Finetuning, Serving & Inference

Recent Activity

upvoted a paper 6 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

liked a model 9 days ago

deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

upvoted a collection 10 days ago

View all activity

Organizations

upvoted a paper 6 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 252

upvoted a collection 10 days ago

Moonlight-A3B

Moonshot's Compute-efficient MoE LLM, first Scaling Up of Muon Optimizer • 3 items • Updated Nov 2 • 9

upvoted a paper 10 days ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published Jan 22 • 126

upvoted a collection 11 days ago

FunctionGemma

3 items • Updated 12 days ago • 29

upvoted an article 12 days ago

Article

The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator

13 days ago

•

35

upvoted 2 collections 14 days ago

Nemotron v3 Pre-Training

Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 7 days ago • 6

Common Pile v0.1

All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated Jun 6 • 39

upvoted an article 14 days ago

Article

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

15 days ago

•

101

upvoted a paper 14 days ago

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Paper • 2512.13586 • Published 15 days ago • 87

upvoted 2 collections 14 days ago

Nemotron-Post-Training-v3

Collection of datasets used in the post-training phase of Nemotron Nano v3. • 7 items • Updated 7 days ago • 52

NVIDIA Nemotron v3

Open, Production-ready Enterprise Models • 6 items • Updated 7 days ago • 107

upvoted 2 collections 17 days ago

Nemotron-Pre-Training-Datasets

Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 7 days ago • 84

NeMo Gym

Collection of RL verifiable data for NeMo Gym • 13 items • Updated 7 days ago • 31

upvoted a collection 20 days ago

Devstral 2

A couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents. • 3 items • Updated 21 days ago • 37

upvoted a collection 24 days ago

Essential-Web v1.0

10 items • Updated Jun 18 • 10

upvoted a paper 24 days ago

Rethinking Reflection in Pre-Training

Paper • 2504.04022 • Published Apr 5 • 80

upvoted an article 25 days ago

Article

We Got Claude to Fine-Tune an Open Source LLM

26 days ago

•

547

upvoted an article 26 days ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

+2

29 days ago

•

259

upvoted 2 collections 26 days ago

Mistral Large 3

A state-of-the-art, open-weight, general-purpose multimodal model with a granular Mixture-of-Experts architecture. • 4 items • Updated 28 days ago • 80

Ministral 3

A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated 28 days ago • 133