Stefano Fiorucci's picture

In a Training Loop 🔄

Stefano Fiorucci PRO

anakin87

·

AI & ML interests

Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework 🏗️

Recent Activity

updated a model 7 days ago

anakin87/qwen3-ttt-better-merged3

published a model 7 days ago

anakin87/qwen3-ttt-better-merged3

View all activity

Organizations

Posts 23

Post

212

💭 Do thinking traces make Language Models learn better? Curious what others think

𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼
You take an instruction-following LM.
You want to train it with a GRPO-style RL algorithm on a task like Tic Tac Toe.
Rewards are outcome-based, applied only at the end of each episode: win/loss/draw, format adherence...

During training, the model could just output answers, but a common choice is to make it also output thinking traces.

𝗧𝗵𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻
Does forcing the model to produce thinking traces during training actually improve learning❓

💬 I'd like to hear your thoughts. Share ideas and links to relevant papers and resources.

From what I've understood so far, the answer seems to be 𝘆𝗲𝘀.

1️⃣ If you force the model to think during training, it becomes a model that thinks at inference time. It naturally allocates more budget (tokens) to a problem, which tends to improve performance.

2️⃣ While the model's "reasoning" already exists in its activation space, using explicit thinking traces as a scratchpad allows training to steer and shape that reasoning.

3️⃣ As the model produces more traces during training, the RL algorithm can progressively give higher rewards to the reasoning patterns that lead to better outcomes.

Articles 4

Article

28

Exploring Environments Hub: Your Language Model needs better (open) environments to learn

View all Articles

Collections 4

View 4 collections

spaces 6

Gemma 3 270m IT

Chat with Gemma 3 270m IT

Fact Checking rocks!

Fact checking baseline. Dense retrieval + textual entailment

Phi 3.5 Mini ITA

Chat with an Italian Small Model

Gemma 2 2B Neogenesis ITA

Chat with an Italian Small Model

Gemma 2 9B Neogenesis ITA

9B Italian strong model 💪

Who killed Laura Palmer?

models 18

anakin87/qwen3-ttt-better-merged3

Text Generation • 0.6B • Updated 7 days ago • 25

anakin87/qwen3-ttt-better-merged

Text Generation • 0.6B • Updated 15 days ago • 28

anakin87/Qwen3-0.6B-alphabet-sort-grpo

Text Generation • 0.6B • Updated Sep 4 • 4

anakin87/gemma-2-2b-ita-sft

Text Generation • 3B • Updated Jun 29 • 7

anakin87/electra-italian-xxl-cased-squad-it

Question Answering • 0.1B • Updated Jun 29 • 16 • 8

anakin87/gemma-2b-orpo

Text Generation • 3B • Updated Jun 29 • 54 • 28

anakin87/qwen-scheduler-7b-grpo

Text Generation • Updated Apr 26 • 6

anakin87/gemma-2-9b-neogenesis-ita

Text Generation • 9B • Updated Mar 10 • 1.11k • • 11

anakin87/gemma-2-2b-neogenesis-ita

Text Generation • 3B • Updated Jan 16 • 1.34k • • 6

anakin87/Phi-3.5-mini-ITA

Text Generation • 4B • Updated Nov 16, 2024 • 2.08k • 13

datasets 9

anakin87/Qwen3-0.6B-tuned-alphabet-sort-eval

Viewer • Updated Sep 4 • 15 • 17

anakin87/Qwen3-0.6B-alphabet-sort-eval

Viewer • Updated Sep 4 • 15 • 9

anakin87/events-scheduling

Viewer • Updated Apr 26 • 600 • 111

anakin87/evol-dpo-ita-reranked

Viewer • Updated Jan 14 • 19.8k • 33 • 5

anakin87/gemma-vs-gemma-preferences

Viewer • Updated Jan 14 • 24.7k • 42

anakin87/fine-instructions-ita-70k

Viewer • Updated Jan 14 • 69.9k • 43 • 4

anakin87/FineTome-single-turn-dedup

Viewer • Updated Jan 11 • 83.3k • 19

anakin87/tulu-3-sft-mixture-with-language

Viewer • Updated Dec 11, 2024 • 939k • 65

anakin87/medrag-pubmed-chunk

Viewer • Updated Feb 25, 2024 • 15.4k • 43