Nishith Jain
KingNish
AI & ML interests
AI is fun actually.
Busy till June 2025.
Recent Activity
reacted
to
burtenshaw's
post
with đ¤
about 15 hours ago
everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!
1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running
```txt
git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft
```
plus this with `--no-deps`
```txt
git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly
```
2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb
3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.
4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.
```python
from trl import GRPOConfig
training_args = GRPOConfig(
learning_rate = 5e-6,
adam_beta1 = 0.9,
adam_beta2 = 0.99,
weight_decay = 0.1,
warmup_ratio = 0.1,
lr_scheduler_type = "cosine",
optim = "adamw_8bit",
logging_steps = 1,
per_device_train_batch_size = 2,
gradient_accumulation_steps = 1,
num_generations = 2,
max_prompt_length = 256,
max_completion_length = 1024 - 256,
num_train_epochs = 1,
max_steps = 250,
save_steps = 250,
max_grad_norm = 0.1,
report_to = "none",
)
```
5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth
```python
from transformers import AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)
```
if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.
https://huggingface.co/reasoning-course
Organizations
KingNish's activity
Adding `safetensors` variant of this model
#1 opened 5 days ago
by
SFconvertbot

Adding `safetensors` variant of this model
#1 opened 6 days ago
by
SFconvertbot

Adding `safetensors` variant of this model
#1 opened 7 days ago
by
SFconvertbot

Upgrade gradio version
#1 opened 4 months ago
by
Csplk
Optimized for speed
1
#7 opened about 1 month ago
by
KingNish

Update chatbot.py
#68 opened about 1 month ago
by
mancooper
Background Video Removal
1
#1 opened 3 months ago
by
Adeal1

Adding `safetensors` variant of this model
#1 opened 5 months ago
by
SFconvertbot

Upload 382661044_3835227736701043_416786998373332440_n.mp4
#4 opened 5 months ago
by
AkiraDotNels
Delete rickroll-2sec.mp4
#5 opened 5 months ago
by
AkiraDotNels
I wondered whether the uploaded background video can only be used as an image with its first frame
1
#6 opened 5 months ago
by
namiachy

[FEATURE] Community Tools
76
#569 opened 6 months ago
by
nsarrazin

Distributed training code (no Unsloth)
6
#3 opened 5 months ago
by
lunahr

The data source
2
#4 opened 5 months ago
by
lockon

what?
2
#10 opened 5 months ago
by
pencilmender

The speed of inference is a bit slow!
7
#2 opened 5 months ago
by
walkingwithGod

Generation model?
4
#2 opened 5 months ago
by
Benjoyo

Adding `safetensors` variant of this model
#1 opened 5 months ago
by
SFconvertbot
