Nishith Jain's picture

Nishith Jain

KingNish

AI & ML interests

AI is fun actually. Busy till June 2025.

Recent Activity

reacted to burtenshaw's post with 🤗 about 5 hours ago
everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go! 1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running ```txt git+https://github.com/huggingface/transformers@main git+https://github.com/huggingface/trl.git@main bitsandbytes peft ``` plus this with `--no-deps` ```txt git+https://github.com/unslothai/unsloth-zoo.git@nightly git+https://github.com/unslothai/unsloth.git@nightly ``` 2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb 3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps. 4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters. ```python from trl import GRPOConfig training_args = GRPOConfig( learning_rate = 5e-6, adam_beta1 = 0.9, adam_beta2 = 0.99, weight_decay = 0.1, warmup_ratio = 0.1, lr_scheduler_type = "cosine", optim = "adamw_8bit", logging_steps = 1, per_device_train_batch_size = 2, gradient_accumulation_steps = 1, num_generations = 2, max_prompt_length = 256, max_completion_length = 1024 - 256, num_train_epochs = 1, max_steps = 250, save_steps = 250, max_grad_norm = 0.1, report_to = "none", ) ``` 5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth ```python from transformers import AutoModelForImageTextToText model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it) ``` if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way. https://huggingface.co/reasoning-course
View all activity

Organizations

Wikimedia's profile picture OpenGVLab's profile picture Blog-explorers's profile picture Multi🤖Transformers's profile picture The Collectionists's profile picture HelpingAI's profile picture ZeroGPU Explorers's profile picture Project Fluently's profile picture Poscye's profile picture INNOVA AI's profile picture Narra's profile picture Social Post Explorers's profile picture Cognitive Computations's profile picture Dev Mode Explorers's profile picture Stable Diffusion Community (Unofficial, Non-profit)'s profile picture ONNX Community's profile picture Hugging Face Discord Community's profile picture Nerdy Face's profile picture grafite's profile picture None yet's profile picture Project R's profile picture Doge Face's profile picture

KingNish's activity

reacted to AdinaY's post with 🔥🤗 about 5 hours ago
reacted to burtenshaw's post with 🤗 about 5 hours ago
view post
Post
359
everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!

1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running

git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft


plus this with --no-deps

git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly


2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.

4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.

from trl import GRPOConfig

training_args = GRPOConfig(
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    logging_steps = 1,
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 1,
    num_generations = 2,
    max_prompt_length = 256,
    max_completion_length = 1024 - 256,
    num_train_epochs = 1,
    max_steps = 250,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none",
)


5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth

from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)


if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.

https://huggingface.co/reasoning-course
  • 2 replies
·
reacted to thomwolf's post with 🔥 1 day ago
view post
Post
1140
We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions
reacted to BrigitteTousi's post with 🤗 1 day ago
view post
Post
3548
Regardless of X being down or not, so glad I can rely on HF Posts for AI news ❤️🤗
  • 1 reply
·
reacted to Smooke's post with 👍 1 day ago
view post
Post
1729
Hallucinations Blog Research Reading List:

Hallucinations Are A Feature of AI, Humans Are The Bug https://hackernoon.com/hallucinations-are-a-feature-of-ai-humans-are-the-bug

Overcome LLM Hallucinations Using Knowledge Bases https://hackernoon.com/overcome-llm-hallucinations-using-knowledge-bases

How to Detect and Minimise Hallucinations in AI Models https://hackernoon.com/how-to-detect-and-minimise-hallucinations-in-ai-models

Predictive Coding, AI: Modeling Placebos in RCTs for Psychedelics and Antidepressants https://hackernoon.com/predictive-coding-ai-modeling-placebos-in-rcts-for-psychedelics-and-antidepressants

A Simple Method to Improving the Accuracy of Your RAG System https://hackernoon.com/say-goodbye-to-ai-hallucinations-a-simple-method-to-improving-the-accuracy-of-your-rag-system

Gen AI Hallucinations: The Good, the Bad, and the Costly https://hackernoon.com/gen-ai-hallucinations-the-good-the-bad-and-the-costly

Why Do LLMs Hallucinate? https://hackernoon.com/why-do-llms-hallucinate

Truth Serum For The AI Age: Factiverse To Fight Fake News And Hallucinations https://hackernoon.com/truth-serum-for-the-ai-age-factiverse-to-fight-fake-news-and-hallucinations

A Secret Technique To Sidestepping LLM Hallucinations https://hackernoon.com/a-secret-technique-to-sidestepping-llm-hallucinations

The Importance of Explainability in AI (XAI) https://hackernoon.com/tackling-ai-hallucinations-the-importance-of-explainability-in-ai-xai

What You Need to Know About Amazon Bedrock’s RAG Evaluation and LLM-as-a-Judge for Advancing AI https://hackernoon.com/what-you-need-to-know-about-amazon-bedrocks-rag-evaluation-and-llm-as-a-judge-for-advancing-ai

I Over Relied on AI and Those Shortcuts Cost Me https://hackernoon.com/i-over-relied-on-ai-and-those-shortcuts-cost-me

AI’s Non-Determinism, Hallucinations, And... Cats? https://hackernoon.com/ais-non-determinism-hallucinations-and-cats

More to read --> https://hackernoon.com/search?query=hallucinations

reacted to JingzeShi's post with 🚀❤️ 3 days ago
reacted to BlinkDL's post with 🔥 3 days ago
view post
Post
4939
RWKV-7 "Goose" 0.4B trained w/ ctx4k automatically extrapolates to ctx32k+, and perfectly solves NIAH ctx16k 🤯 100% RNN and attention-free. Only trained on the Pile. No finetuning. Replicable training runs. tested by our community: https://github.com/Jellyfish042/LongMamba
reacted to fdaudens's post with 🤗 4 days ago
view post
Post
5642
Honored to be named among their 12 pioneers and power players in the news industry in the 2025 Tech Trends Report from Future Today Strategy Group.

Incredible group to be part of - each person is doing groundbreaking work at the intersection of AI and journalism. Worth following them all: they're consistently sharing practical insights on building the future of news.

Take the time to read this report, it's packed with insights as always. The news & information section's #1 insight hits hard: "The most substantive economic impact of AI to date has been licensing payouts for a handful of big publishers. The competition will start shifting in the year ahead to separate AI 'haves' that have positioned themselves to grow from the 'have-nots.'"

This AI-driven divide is something I've been really concerned about. Now is the time to build more than ever!

👉 Full report here: https://ftsg.com/wp-content/uploads/2025/03/FTSG_2025_TR_FINAL_LINKED.pdf
  • 2 replies
·